簡易檢索 / 詳目顯示

研究生: 莊淳雅
Chuang, Chun-Ya
論文名稱: 虛擬導播系統
Virtual Director System
指導教授: 陳世旺
Chen, Sei-Wang
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 90
中文關鍵詞: 虛擬導播攝影美學多核學習反傳遞類神經網路時空聚集運鏡
英文關鍵詞: virtual director, photographic aesthetics, multiple kernel learning, CPN, STA, entropy, steering motion
DOI URL: https://doi.org/10.6345/NTNU202203719
論文種類: 學術論文
相關次數: 點閱:103下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 一場完整的演講錄製通常會有兩臺以上的攝影機用來拍攝不同的主體,例如:演講者、聽眾等。而負責選鏡的導播會從其中選出最適合的畫面播放給觀看者。一個專業的導播需要經過長時間的訓練和實際經驗,才能越符合觀看者的期待。為了節省導播訓練的成本,本研究提出一個能模擬實際導播的運作和工作的系統,稱之為「虛擬導播系統」。
      本研究提出的虛擬導播系統涵蓋實際導播的兩項主要的工作:選鏡和攝影指導。本系統假設已有三組虛擬攝影師,分別提供講者、觀眾和全景的畫面。在選鏡的階段,我們提出九種準則,以美學、光學、連續性和攝影機動作等的角度分析,來評估候選畫面的優劣,再決定出最適合的播放畫面。由於九種準則的分數為異質性的資料,我們以多核學習將資料映射在同一空間中,如此我們即可將異質分數結合成一評估值,以此來決定最佳的畫面。以上的工作是由一事先訓練好的反傳遞類神經網路來完成,其乃模擬真實導播的選鏡風格。
      在攝影指導的階段,導播綜觀各個虛擬攝影師所傳送而來的畫面,經評估後給予虛擬攝影師運鏡的建議。因為不同的攝影師所拍攝的主體相異,使得導播對每個視訊評估的方式不同。在演講過程中,演者的手勢姿勢會隨著演講的內容而有變化;台下觀眾的擾動大小代表此演講的表現。因此,本系統根據講者的手勢姿勢、觀眾和全景畫面的動點大小及範圍來定義事件。由於不同事件的觸發,我們定義相對應的攝影指導,模擬真實導播要求畫面來達到建議運鏡的目的。
    在實驗時,本研究採用監督式學習來訓練反傳遞類神經網絡,在收集訓練資料時,我們邀請對導播和攝影技術有研究的人員來提供預期的輸出,使得虛擬導播選鏡方式能更貼近觀看者的需求。我們拍攝數場實際演講,包括不同的場地(例如:平面式觀眾席和階梯式觀眾席)和不同型態(例:專題演講、實驗室會議和課堂教學)。我們將本研究所提出的訓練方法和線性組合的訓練方法透過選鏡來比較,其根據觀看者角度評估選鏡合宜;並與人工選鏡的相似度進行比較和分析。

    With two or more video cameras filming at different subjects during a lecture period, a complete lecture recording is considered done. A professional director will select a best shot for the target audiences It takes long period of time, training and experience to succeed a professional director in order to provide the most suitable viewing experiences to the audiences. Therefore, a “virtual director system” is proposed to achieve the goal, a system for simulating operations and workings of directors, and to cost down the hiring and training processes of a professional director.
    Two important segments are included in the proposed research, virtual director system, the shot selection and visual instruction. This research will evaluate the contents and classify them into nine standards of viewings. This research uses multiple kernel learning and spatio-temporal aggregation (STA) to train data and simulate a director whom has a unique shooting style.
    This system includes three groups of virtual cameramen to film a speaker, audiences and overview respectively. Visual instruction is a director giving cameramen shooting advices according to frames from different cameraman. This system can define events based on speaker’s gesture, moving points of size and ranges from audience frame and overview frame, then sending to different instructions to recommend steering mode. Through lecture record testing, analyzing and comparing with other methods, this research is more comfortable to view’s expectation.

    第一章 緒論 1 1.1 研究動機 1 1.2 文獻探討 4 1.2.1 影像畫面的評估 5 1.2.2 多重畫面的選鏡 6 1.3 論文架構 7 第二章 系統架構與流程 9 2.1 系統架構 9 2.2 虛擬導播的系統流程 13 第三章 虛擬導播之選鏡 15 3.1 內容分析 15 3.1.1 美學分析 16 3.1.2 光學分析 21 3.1.3 連續性分析 25 3.1.4 動作分析 28 3.2 多重畫面的決策 31 3.2.1 Multiple Kernel Learning(MKL) 32 3.2.2 Counterpropagation Network(CPN) 42 3.3 測試資料用於決策模型 44 3.4 結語 44 第四章 虛擬導播之攝影指導 46 4.1 講者的手部姿勢 46 4.1.1 手部姿勢的樣板資料庫 47 4.1.2 手部姿勢的辨識 50 4.1.3 手部姿勢的類別定義 51 4.2 動點動作大小及範圍 55 4.2.1 Spatio-temporal aggregation(STA) 56 4.2.2 Entropy 61 4.3 事件定義與攝影指導 63 4.4 結語 67 第五章 實驗結果 69 5.1 實驗前的準備工作 69 5.1.1 使用者介面 69 5.1.2 訓練決策模型 71 5.2 實驗設備與初步結果 72 5.2.1 實驗器材與架設方式 73 5.2.2 初步結果 73 5.3 與其它方法的比較和分析 75 5.3.1 實驗一 75 5.3.2 實驗二 77 5.3.3 實驗三 82 5.3.4 實驗四 84 第六章 結論與未來工作 85 6.1 結論 85 6.2 未來工作 86 參考文獻 87

    [Abd10] G. Abdollahian, C. M. Taskiran, Z. Pizlo, and E. J. Delp, “Camera Motion-Based Analysis of User Generated Video,” IEEE Transaction on Multimedia, Vol. 12, No. 1, 2010.
    [Bia98] M. Bianchi, “Auto Auditorium: A Fully Automatic, Multi-camera System to Televise Auditorium Presentations,” Proc. of the Joint DARPA/NIST Workshop on Smart Spaces Technology, 1998.
    [Che95] Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Transaction on PAMI, Vol. 17, No. 8, pp. 790-799, 1995.
    [Cru94] G. Cruz and R. Hill, “Capturing and Playing Multimedia Events with STREAMS,” Proc. ACM Int’l Conf. on Multimedia, pp. 193-200, 1994.
    [Fan03] C. Y. Fang, S. W. Chen, and C. S. Fuh “Automatic Change Detection of Driving Environments in a Vision-Based Driver Assistance System,” IEEE Transactions on Neural Networks, vol. 14, no. 3, pp. 646-657, 2003.
    [Hu15] M. C. Hu, C. W. Chen, W. H. Cheng, C. H. Chang, J. H. Lai, and J. L. Wu, “Real-Time Human Movement Retrieval and Assessment With Kinect Sensor”, IEEE Transactions on Cybernetics, Vol. 45, No. 4, 2015.
    [Kum02] M. Kumano, Y. Ariki, M. Amano, K. Uehara,”Video Editing Support System Based on Video Grammar and Content Analysis,” Proceedings. of the International Conference on Pattern Recognition(ICPR) , vol. 2, pp. 1031-1036, 2002.
    [Li12] C. I. Li, A. C. Luo, C. J. Lu, and Sei-Wang Chen, “Automated Lecture Recording System –the Virtual Cameraman Subsystem,” Proc. of the 25th IPPR Conf. on CVGIP. 2012.
    [Liu01] Q. Liu, Y. Rui, A. Gupta, and J. J. Cadiz, “Automating Camera Management for Lecture Room Environments,” Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, pp. 442-449, 2001.
    [Liu11] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.Y. Shum, “Learning to Detect a Salient Object, “ IEEE Transactions on PAMI, Vol. 33, No. 2, pp. 353-367, 2011.
    [Lu13] C. J. Lu, A. C. Luo, C. F. Hsu, and S. W. Chen, “Virtual Director - Real-Time Automatic Shot Selection,” Proc. of the 26th IPPR Conf. on CVGIP, Aug. 2013.
    [Luc81] B. D. Lucas, T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” Proceedings of Imaging Understanding Workshop, pp. 121-130, 1981.
    [Mac02] E. Machnicki and L. Rowe, “Virtual director: Automating a webcast,” Multimedia Comput. Network., 2002.
    [Oku07] S. Okuni, S. Tsuruoka, G. P. Rayat, H. Kawanaka, T. Shinogi, “Video Scene Segmentation Using the State Recognition of Blackboard for Blended Learning,” International Conference on Convergence Information Technology, pp. 2437-2442, 2007.
    [Oni04] M. Onishi and K. Fukunaga, “Shooting the Lecture Scene Using Computer-Controlled Cameras based on Situation Understanding and Evaluation of Video Images” Proc. of the 17th International Conference on Mobile and Ubiquitous Multimedia, pp. 781–784, 2004.
    [Ren12] W. Y. Ren, G. H. Li, J. Chen, and H. Z. Liang, “Abnormal Crowd Behavior Detection Using Behavior Entropy Model,” Proceedings of the 2012 International Conference on WAPR, pp. 212-221, 2012.
    [Sho13] J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake, “Efficient Human Pose Estimation from Single Depth Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 12, 2013.
    [Tav14] M. Tavassolipour, M. Karimian, and Shohreh Kasaei, “Event Detection and Summarization in Soccer Videos Using Bayesian Network and Copula”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 24, No. 2, 2014.
    [Wan09] T. Wang, A. Mansfield, R. Hu, J. Collomosse, “An Evolutionary Approach to Automatic Video Editing,” Proceedings. of the International Conference on Visual Media Production(CVMP), pp. 127-134, 2009.
    [Yan14] H. Yang and C. Meinel, “Content Based Lecture Video Retrieval Using
    Speech and Video Text Information”, IEEE Transactions on Learning Technologies, Vol. 7, No. 2, 2014.
    [Yen04] P. S. Yen, C. Y. Fang, and S. W. Chen, “Motion Analysis of Nearby Vehicles on a Freeway,” IEEE International Conference on Networking, Sensing and Control, Vol.2, pp.903-908, 2004.

    下載圖示
    QR CODE