研究生: |
林少擎 Lin, Shao-Ching |
---|---|
論文名稱: |
基於編碼之隱性運動感知的影像超解析度 Coding-based video super-resolution with implicit motion perception |
指導教授: |
葉家宏
Yeh, Chia-Hung |
口試委員: |
葉家宏
Yeh, Chia-Hung 胡武誌 Hu, Wu-Chih 李界羲 Lee, Jie-Si |
口試日期: | 2022/07/05 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 影像壓縮 、運動估計 、運動補償 、影像超解析度 |
英文關鍵詞: | Video Compression, Motion Estimation, Motion Compensation, Video Super-Resolution |
研究方法: | 實驗設計法 、 主題分析 |
DOI URL: | http://doi.org/10.6345/NTNU202201196 |
論文種類: | 學術論文 |
相關次數: | 點閱:111 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
影像超解析度和單張影像超解析度相比較為複雜,要修復影像中的細節和背景且同時要保持時域一致性,雖然許多現存方法致力於完成此任務,但由於運動資訊不精確而造成的過度平滑仍未解決,為了處理此問題,我提出了基於編碼之影像超解析度,該方法直接從影像解碼器中得到運動補償幀和殘差幀,使用了兩個的神經網路分別將兩個幀上採樣,接著優化模組會將上述兩個神經網路的輸出結果結合並產生高品質影像。提出方法可以有效避免帶有複雜運動訊息的低解析度影像產生出模糊的高解析度影像,這是因為從影像解碼器中提取到的運動補償幀和殘差幀保有更精準的運動訊息。實驗結果顯示提出方法在REDS和Vid4兩個指標資料集上有更突出的成果。
Unlike single-image-based super-resolution, video super-resolution (VSR) is twofold: to restore fine details while saving coarse ones and preserving motion consistency. Although many approaches have been proposed for this task, the over-smoothed problem caused by motion inconsistency remains challenging. This thesis presents a VSR framework called coding-based video super-resolution (CBVR) to address this problem. It directly utilizes the characteristics of the motion-compensated frame and its residuals obtained from the video decoder. Two separate networks are applied separately to up-sample motion-compensated frames and residuals in our method. A refinement model is used to integrate the output of the two networks to produce high-quality videos. The proposed method can effectively avoid blurry output HR frames created by mixing the values of multiple motion compensated input LR frames. This is because the video decoder's motion-compensated frames and residual frames have more precise motion information. Whether deep-learning-based or traditional, the previous estimating methods cannot obtain such accurate information. Experimental results demonstrate that our proposed CBVR can achieve state-of-the-art performance on REDS and Vid4 benchmarks compared to existing video super-resolution approaches.
[1] C. Peng, W. A. Lin, H. Liao, R. Chellappa, S. K. Zhou,"SAINT: spatially aware interpolation network for medical slice synthesis," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7750-7759, 2020.
[2] Y. Luo, L. Zhou, S. Wang, and Z. Wang,"Video satellite imagery super resolution via convolutional neural networks," in Proceedings of IEEE Geoscience and Remote Sensing Letters, pp. 2398-2402, 2017.
[3] A. B. Deshmukh, and N. Usha Rani,"Fractional-Grey Wolf optimizer-based kernel weighted regression model for multi-view face video super resolution," in International Journal of Machine Learning and Cybernetics, pp. 859-877, 2019.
[4] A. J. Patti, M. I. Sezan, and A. M. Tekalp,"Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time," in IEEE transactions on image processing, pp. 1064-1076, 1997.
[5] R. Keys, “Cubic Convolution Interpolation for Digital Image Processing,” in IEEE transactions on acoustics, speech, and signal processing, pp. 1153-1160, 1981.
[6] C. Dong, C. C. Loy, K. He and X. Tang, “Image Super-Resolution Using Deep Convolutional Networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 295-307, 2015.
[7] C. Dong, C. C. Loy and X. Tang, “Accelerating the Super-Resolution Convolutional Neural Network,” in European conference on computer vision, pp. 391-407, 2016.
[8] J. Kim, J. K. Lee and K. M. Lee, “Accelerating the Super-Resolution Convolutional Neural Network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646-1654, 2016.
[9] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[10] Y. Tai, J. Yang and X. Liu, “Image Super-Resolution via Deep Recursive Residual Network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3147-3155, 2017.
[11] G. Huang, Z. Liu, L. V. D. Maaten and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
[12] T. Tong, G. Li, X. Liu and Q. Gao, “Image Super-Resolution Using Dense Skip Connections,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4799-4807, 2017.
[13] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang and W. Shi, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681-4690, 2017.
[14] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection networks for super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1664-1673, 2018.
[15] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136-144, 2017.
[16] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European Conference on Computer Vision, pp. 286-301, 2018.
[17] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472-2481, 2018.
[18] R. Liao, X. Tao, R. Li, Z. Ma and J. Jia, “Video super-resolution via deep draft-ensemble learning,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 531-539, 2015.
[19] A. Kappeler, S. Yoo, Q. Dai and A. K. Katsaggelos, “Video super-resolution with convolutional neural networks,” in IEEE Transactions on Computational Imaging, pp. 109-122, 2016.
[20] X. Tao, H. Gao, R. Liao, J. Wang and J. Jia, “Detail-revealing deep video super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4472-4480, 2017.
[21] Xi. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo, “Convolutional LSTM network: A machine learning approach for precipitation now-casting,” in Advances in neural information processing systems, 2015.
[22] M. S. M. Sajjadi, R. Vemulapalli and M. Brown, “Frame-Recurrent Video Super-Resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626-6634, 2018.
[23] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. V. S. Smagt, D. Cremers and T. Brox, “FlowNet: Learning Optical Flow With Convolutional Networks,” in Proceedings of the IEEE International Conference on Computer Vision,” pp. 2758-2766, 2015.
[24] M. Chu, Y. Xie, J. Mayer, L. Leal-Taixe, and N. Thuerey, “Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation,” in ACM Transactions on Graphics, 2020.
[25] J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-time video super-resolution with spatio-temporal networks and motion compensation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778-4787, 2017.
[26] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” in International Journal of Computer Vision, vol. 127, no.8, pp.1106-1125, 2019.
[27] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3224-3232, 2018.
[28] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu and Y. Wei, “Deformable Convolutional Networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 764-773, 2017.
[29] X. Zhu, H. Hu, S. Lin and J. Dai, “Deformable ConvNets V2: More deformable, better results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308-9316, 2019.
[30] X. Wang, K. C.K. Chan, K. Yu, C. Dong and C. C. Loy, “EDVR: Video Restoration With Enhanced Deformable Convolutional Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019.
[31] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in roceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161-4170, 2017.
[32] D. Sun, X. Yang, M. Liu and J. Kautz, “PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934-8943, 2018.
[33] T. Hui, X. Tang and C. C. Loy, “LiteFlowNet: A lightweight convolutional neural network for optical flow estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981-8989, 2018.
[34] T. Hui, X. Tang and C. C. Loy, “A lightweight optical flow cnn - revisiting data fidelity and regularization,” in arXiv preprint arXiv:1903.07414, 2019.
[35] Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360-3369, 2020.
[36] T. Isobe, F. Zhu, X. Jia, and S. Wang, “Revisiting temporal modeling for video super-resolution,” in arXiv preprint arXiv: 2008.05765, 2020.
[37] S. Nah, S. Baik, S. Hong, G. Moon, S. Son, R. Timofte, and K. M. Lee, “Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1996-2005, 2019.
[38] C. Liu, and D. Sun, “On bayesian adaptive video super resolution,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 2, pp. 346-360, 2014.
[39] Isobe, T., Li, S., Jia, X., Yuan, S., Slabaugh, G., Xu, C., and Tian, Q, “Video super-resolution with temporal group attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8008-8017, 2020.
[40] H. R. Sheikh and C. B. Alan, “Image information and visual quality,” in IEEE Transactions on image processing, pp. 430-444, 2006.
[41] S. Li, F. Zhang, L. Ma and K. N. Ngan, “Image quality assessment by separately evaluating detail losses and additive impairments,” in IEEE Transactions on Multimedia, pp. 935-949, 2011.
[42] H. W. Wang, D. W. Su, C. C. Liu, L. Jin, X. F. Sun, and X. F. Peng, ” Deformable non-local network for video super-resolution,” in IEEE Access, pp. 177734-177744, 2019.
[43] J. Dai, H.Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, ”Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 764-773, 2017.
[44] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNets V2: More Deformable, Better Results,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9308-9316, 2019.
[45] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, 2018.
[46] X. Ying, L. Wang, Y. Wang, W. Sheng, W. An, and Y. Guo, “Deformable 3d convolution for video super-resolution,” in Proceedings of the IEEE Conference on Signal Processing Letters, pp. 1500-1504, 2020.
[47] W. Bao, W. S. Lai, X. Zhang, Z. Y. Gao, and M. H. Yang,”Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement,” in IEEE transactions on pattern analysis and machine intelligence, pp. 933-948, 2019.
[48] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Smagt, D. Cremers, and T. Brox,”Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2758-2766, 2015.
[49] O. Ronneberger, P. Fischer, and T. Brox,”U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 2015.
[50] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee,“Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136-144, 2017.
[51] K. He, X. Zhang, S. Ren, and J. SunDeep,”Residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.