研究生: |
陳彥霖 Chen, Yan-Lin |
---|---|
論文名稱: |
應用潛在語意分析於試題相似度比較之可行性 The feasibility of applying Latent Semantic Analysis to analyze Item similarity |
指導教授: | 何榮桂 |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 潛在語意分析 、試題相似 、評分函式 、LSA |
英文關鍵詞: | latent semantic analysis, Item similarity, score function, LSA |
論文種類: | 學術論文 |
相關次數: | 點閱:201 下載:6 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在應用潛在語意分析(Latent semantic analysis,LSA)模型於試題相似度之判斷,並探討不同的評分函式對於結果的影響,同時根據試題關鍵字的特性,與LSA模型處理詞彙共現(Lexically Co-occur)的特性,提出訓練文件可採用相關文件來提高判斷的精確率。研究結果使用dice或內積為評分函式較接近專家評鑑結果,對於專家相似度評鑑比較一致的試題,有高達0.9的相關程度,而平均相關值也有0.7以上的相關程度,因此潛在語意分析應用於試題相似度是可行的技術。
The purpose of this study is to apply latent semantic analysis (LSA) to analyze item similarity , and discuss the result of using different score function. The feature of LSA model is “Lexically Co-occur” detection , in other words, LSA model can analyze many documents, and find synonyms , but synonyms rarely exist in the same item , so LSA model needs to be trained by documents which are related to this item . This study revealed that the result using dice measure or inner product measure correlates more closely with expert’s scores. For the items which is more agreeable of expert’s scores than others , the maximum correlation is up to 0.9, and the mean of correlation is up to 0.7, so applying latent semantic analysis to analyze item similarity is a feasible technology.
中央研究院資訊科學所詞庫小組,中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw (2005/12/29擷取)。
台灣省國教研習會編(1993), 新法考試的命題技術, 國民小學學習成就評量, 第七頁。
何榮桂(1991), 電腦化題庫概述, 現代教育, 18期, 頁121-129。
何榮桂、陳麗如(1998), 電腦化適性測驗題庫品質管理策略之研究, 第七屆國際電腦輔助教學研討會, 409-410。
陳柏琳(2005), Chinese Spoken Document Recognition, Organization and Retrieval, 網路資訊檢索技術與趨勢研討會。
郭榮芳(2005), 應用潛在語意分析於測驗題庫相似性之比對, 國立臺灣師範大學資訊教育研究所碩士論文。
鄭淑玲、葉瑞峰、鄭雙慧(2003), 結合隱含式語意分析與基因演算法之適性化遠距教學測驗評量系統, TANET, C5 網路教學系統, ID 9835。
K.J. Chen & S.H. Liu(1992). Word Identification for Mandarin Chinese Sentences. Proceedings of COLING 1992, pages 101-107.
K.J. Chen & Ming-Hong Bai(1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method. International Journal of Computational linguistics and Chinese Language Processing, Vol.3, #1, 27-44.
K.J. Chen & Wei-Yun Ma (2002). Unknown Word Extraction for Chinese Documents. Proceedings of COLING, 169-175.
Dice, L. R. (1945). Measure of the Amount of Ecologic Association between Species. Journal of Ecolog, 26, 297-302.
Dumais, S.T(1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers,23,229-236.
Frakes, W. B. and Baeza-Yates, R. (1992) . Information Retrieval, Data Structure and Algorithms. Prentice Hall.
Foltz PW, Kintsch W., and Landauer TK. (1993). An analysis of textual
coherence using Latent Semantic Indexing .Society for Text and Discourse, Jackson, WY
Gavin.W. O’Brien (1994). Information Management Tools for Updating an SVD-Encoded Indexing Scheme. TR UT-CS-94-259, U. Tenn.
Harman ,D.(1992). Relevance feedback and other query modification techniques. Information Retrieval: Data structures and algorithms. Englewood Cliffs NJ: Prentice Hall, 363-392.
Hull, D.(1994). Improving Text Retrieval for the Routing Problem using Latent Semantic Indexing. ACM SIGER Conference, 282-291.
J.-T. Chien, M.-S. Wu and H.-J. Peng(2004). On latent semantic language modeling and smoothing. Proceedings of International Conference on Spoken Language Processing vol. 2, 1373-1376.
Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, and Guihong Cao(2004). Dependence language model for information retrieval. In SIGIR, 2004.
Landauer,T.& S.Dumais. (1997).A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition,induction,and representation of knowledge .Psychological Review 104,
211-240.
Landauer,T.K.,D.Laham & P.W.Foltz.(1998).Computer-based grading of the conceptual content of essays. Unpublished manuscript.
Landauer,T.,P.W.Foltz & D.Lanham(1998). An introduction to latent semantic analysis . Discourse Processes 25,259-284.
MacDonald, I. L., & Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-valued Time Series (1st ed.). London: Chapman&Hall.
Ma Wei-Yun & K.J. Chen(2003). A bottom-up Merging Algorithm for Chinese Unknown Word Extraction. Proceedings of ACL workshop on Chinese Language Processing , 31-38.
Salton, G. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
Salton, G.& Buckley,C.(1988).Term-weighting approaches in automatic text retrieval. Information Processing and Management,24(5),513-523.
Singhal, A. and Salton, G.(1998). AutomaticText Browsing Using Vector Space Model. Proceeding of the Dual-Use Technologies and Applications, 318-324.
Sullivan, D.(2001). Document Warehousing and Text Mining. Wiley Computer Publishing, 326.
Trivedi, A., Medonca, A. E., Johnson, B. S.(2004). Using Machine Learning for Classifying Documents and Extracting Features. 11th World Congress of Medical Informatics.
Xiangzhu, G. and Murugesan, S.(2003). A Dynamic Information Retrieval System for the Web. Proceedings of the Annual International Computer Software and Applications Conference, 670-675.
Y. Akita and T. Kawahara(2004). Language modeling adaptation based on PLSA of topics and speakers. Proceedings of International Conference on Spoken Language Processing.