研究生: |
陳信裕 Chen, Sin-Yu |
---|---|
論文名稱: |
利用廣義知網及維基百科於劇本文件之廣告推薦 Using E-HowNet and Wikipedia in Advertisement Recommendation for Scripts |
指導教授: | 侯文娟 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 57 |
中文關鍵詞: | 文件探勘 、劇本分析 、廣告推薦 、特徵詞 、廣義知網 、維基百科 |
英文關鍵詞: | text mining, script analysis, advertisement recommendation, feature words, E-HowNet, Wikipedia |
DOI URL: | https://doi.org/10.6345/NTNU202204364 |
論文種類: | 學術論文 |
相關次數: | 點閱:183 下載:12 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的研究議題,主要是因為觀察到目前電視劇進入廣告時段時,大部分的廣告內容很枯燥乏味又冗長,而且廣告和電視劇的內容又不相關,導致在此時會有不少觀眾會選擇轉到其他電視台,先觀看其他電視台的電視劇,或是忙一些手邊事情,所以就造成這個時段的廣告廠商效益因此降低,而且廣告播映都需使用人力排程,費時又費力。因此本論文希望建立一個自動化劇本分析與廣告推薦系統,先從劇本內容分析與探勘重要的特徵詞,作為模型中有效且具高準確率的特徵,讓所推薦的廣告在播出時能夠吸引觀眾的目光,使廣告商品可以得到最大效益。
本論文實驗資料來源分別由兩種取得:第一種來源是從金穗獎劇本網站中找出12個劇本做為劇本文件資料,第二種來源是從維基百科中搜尋廣告商品,取得廣告商品簡介做為廣告商品資料庫。經由本論文所提方法實驗之後,最後會以自動化的方式互相比對,用來驗證本實驗各項結果是否成功,實驗結果評估對象包含劇本重點度為4及5分的段落與最佳廣告之推薦。
研究方法以兩項目標為導向:包含(1)自動化計算各段落重點度,與(2)推薦最佳廣告。為了計算各段落重點度,使用先前研究方法自動化找出劇本中幫助分析重點度的特徵詞,這些特徵詞將是分析重點度時重要的關鍵。而在最佳廣告推薦,於重點度為4及5分的段落內先找出所有特徵詞Na,接著使用廣義知網找出特徵詞Na上兩層的延伸詞,作為幫助劇本段落內容與廣告商品之間的聯結,經由自動化比對後,本研究將依據重點度為4及5分的段落特性,進而得出最佳推薦的廣告,最後所得到的實驗結果再提供給廣告商選擇,讓他們選擇在哪些段落可以下與自家產品相關的廣告,詳細的步驟與方法本文內會再敘述。對於實驗結果,本研究以準確度當做評估的標準。
The research topic of this paper is motivated based on the observation that when entering the TV advertising time, most of the advertising content is very tedious and lengthy, and no relevant ads for TV content. It results that many viewers will choose to turn to other channels, or busy with some things at hand. This situation will reduce the benefits of advertising firms. Besides, the broadcast of ads needs to use the manpower scheduling, which is time-consuming and laborious. Therefore, this paper hopes to establish an automated script analysis and advertisement recommendation system. This study extracts the important features via mining the scripts. The features are used to build a model with characteristics of high accuracy, so that the recommended advertising can attract the viewers’eyes. It will provide the maximum benefit for the advertised goods.
The experimental data of this study come from two sources : the first one of 12 plays is from the Golden Harvest Awards script site script;the second one is from the Wikipedia which contains the introduction of the searched advertised goods. For evaluating the proposed method, an automated way is used. The evaluation target is focused on the script which contains a paragraph degree 4 or 5 stars. Finally, the best advertising is recommended.
The study has two main goals : (1) automated computing the emphasis degrees of paragraphs, and (2) recommending the best advertising. In order to calculate the emphasis degrees of paragraphs, this study utilizes the previous related method to automatically identify the focus of the script by analyzing the feature words. The feature words play an important role on the analysis of the emplasis degrees of paragraphs,also called the focus of the scripts. For the best ad recommendation, this study first finds out all the features of word NA from the paragraphs with the emphasis degrees 4 and 5. Then E-HowNet is used to extend the contents of feature words by retrieving the parents and grandparents words, called the extension words. Finally, the collection of feature words and extension words is compared to the paragraphs with 4 and 5 emphasis degrees in order to recommend the suitable advertising for these paragraphs. The recommended ads are provided to the advertisers for their references. The detailed steps and methods will be described in the paper. The experimental results are evaluated by the accuracy metric.
Blackstock, A., & Spitz, M. (2008). Classifying movie scripts by genre with a MEMM using NLP-Based features. Available at December 12, 2015 from nlp.stanford.edu/course/cs224n/2008/06.pdf.
Eliashberg, J., Jonker, J. J., Sawhney, M. S., Wierenga, B. (2000) MOVIEMOD: An implementable decision support system for pre-release market evaluation of motion pictures, Marketing Science, Vol. 19, No. 3, pp. 226-243.
Gil, S., Kuenzel, L., & Caroline, S. (2011). Extraction and analysis of character interaction networks from plays and movies. Technical report, Stanford University.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh conference on Uncertainty in Artificial Intelligence, pp. 338-345.
Li, S., Wang, Z., Zhou, G., & Lee, S. Y. M. (2011). Semi-supervised learning for imbalanced sentiment classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Vol. 22, No. 3, pp. 1826-1831.
McCallum, A., Freitag, D., & Pereira, F. C. (2000). Maximum entropy Markov models for information extraction and segmentation. ICML, Vol. 17, pp. 591-598.
Mishne, G., & Glance, N. S. (2006). Predicting movie sales from blogger sentiment. Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 155-158.
Qin, Y., Zhang, Y., Zhang, M., & Zheng, D. (2013). Feature-rich segment-based news event detection on twitter. Proceedings of 2013 International Joint Conference on Natural Language Processing, pp. 302-310.
中文斷詞系統, from http://ckipsvr.iis.sinica.edu.tw/
沈信佑,2016,“劇本文件探勘與廣告推薦之研究”,國立臺灣師範大學資訊工程所碩士論文
金穗獎優良劇本, from http://www.movieseeds.com.tw/
維基百科, from https://zh.wikipedia.org/zh-tw/
廣義知網, from http://ehownet.iis.sinica.edu.tw/index.php