研究生: |
陳珮寧 |
---|---|
論文名稱: |
查詢模型化於語音文件檢索之研究 A Study of Query Modeling for Spoken Document Retrieval |
指導教授: | 陳柏琳 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 語音文件檢索 、關聯性語言模型 、查詢模型化 、主題資訊 、非關聯性資訊 |
英文關鍵詞: | Spoken document retrieval, relevance language model, query modeling, topic, non-relevance information |
論文種類: | 學術論文 |
相關次數: | 點閱:99 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語音文件檢索(Spoken Document Retrieval)在語音處理研究領域一直是令人感興趣的研究題目。語音文件檢索的研究常面臨的問題可歸納成三大層面:(1)通常查詢(Query)傴是使用者資訊需求(Information Need)的一種用較含糊的表達方式,並不能完整代表使用者資訊需求所欲表達的語意;(2)在語音文件與使用者查詢中常會使用不同的詞彙來表相同的主題或概念(Topic or Concept);(3)語音文件經自動語音辨識(Automatic Speech Recognition, ASR)轉寫成文字時,常受限於語音辨識之正確率,而導致資訊檢索效能的降低。基於上述觀察,本論文提出許多查詢模型化(Query Modeling)改進方式,用以減輕語音文件檢索面臨的問題。未達此目的,吾人嘗詴探索關聯性語言模型(Relevance Language Model)於語音文件檢
索之使用;同時, 吾人在此模型架構中融入了文件層次主題資訊(Topic
Information)與查詢非相關資訊(Non-relevance Information),以期增進查詢模型化之效果。本論文的實驗是進行在國際廣泛使用的Topic Detection and Tracking(TDT)語料庫;實驗結果顯示吾人所提出之檢索方法,相較於一些現有檢索方法,能達到更好的檢索效能。
Spoken document retrieval (SDR) has recently become a more interesting research avenue due to increasing volumes of publicly available multimedia associated with speech information. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there
probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from
representing the true theme of a spoken document. Many efforts have been devoted to developing elaborate indexing and modeling techniques for representing spoken documents, but few to improving query formulations for better representating the users‟ information needs. In view of this, we presented a novel language modeling framework exploring both lexical- and topic-based relevance formation for improving query effectiveness. We further explore various ways to glean both relevance and non-relevance information from the document collection so as to enhance the modeling of a given query in an unsupervised fashion. Experiments conducted on the TDT (Topic Detection and Tracking) SDR task demonstrate the perofrmance merits of the methods deduced from our retrieval framework deliver
when compared to other existing retrieval methods.
[Baeza-Yates and Ribeiro-Neto, 2011] R. Baeza-Yates and B. Ribeiro-Neto. Modern
Information Retrieval: The Concepts and Technology behind Search. ACM Press,
2011.
[Balog, Weerkamp and Rijke, 2008] K. Balog, W. Weerkamp, and M. de Rijke, “A
few examples go a long way: Constructing query models from elaborate query
formulations,” In Proc. SIGIR, pp. 371-378, 2008.
[Berger and Lafferty, 1999] A. Berger and J. Lafferty, “Information retrieval as
statistical translation,” In Proc. SIGIR, pp. 222–229, 1999.
[Bilmes, 1997] Bilmes, J., “ A gentle tutorial on the EM algorithm and its
application to parameter estimation for Gaussian mixture and hidden Markov
models” (Tech. Report ICSI-TR-97-021). ICSI 1997
[Blei and Lafferty, 2009] D. Blei and J. Lafferty, “Topic models,” In A. Srivastava
and M. Sahami, (eds.), Text Mining: Theory and Applications. Taylor and
Francis, 2009.
[Blei, Ng and Jordan, 2003] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet
allocation,” Journal of Machine Learning Research, 3: 993-1022, January 2003.
[Cartright, Allan , Lavrenko and McGregor, 2010] Cartright M-A., Allan J., Lavrenko
V. and McGregor, “A fast query expansion using approximations of relevance
models,” In Proc. CIKM, 2010.
[Chelba et al., 2008] C. Chelba, T. J. Hazen, and Saraclar, M., “Retrieval and
browsing of spoken content,” IEEE Signal Processing Magazine, Vol. 25, No. 3,
pp. 39–49, 2008.
[Chen, 2009] B. Chen, “Word topic models for spoken document retrieval and
transcription,” ACM Transactions on Asian Language Information Processing,
58
Vol. 8, No.1, pp. 2:1-2:27, 2009.
[Chen, 2009] B. Chen, “Latent topic modeling of word co-occurrence information for
spoken document retrieval,” In Proc. ICASSP, 2009.
[Chen and Chen, 2010] K.Y. Chen, B. Chen, “A study of topic modeling techniques
for spoken document retrieval,” In Proc. APSIPA, 2010.
[Chen and Chen, 2011] K. Y. Chen and B. Chen, “Relevance language modeling for
speech recognition,” In Proc. ICASSP, 2011.
[Chen and Goodman, 1998] S. F. Chen and J. Goodman, “An empirical study of
smoothing techniques for language modeling,” Technical Report TR-10-98,
Computer Science Group, Harvard University, Aug. 1998.
[Chen, Wang and Lee, 2001] B. Chen, H.-M. Wang and L.-S. Lee, “Improved spoken
document retrieval by exploring extra acoustic and linguistic cues,” In
Proceedings of the 7th European Conference on Speech Communication and
Technology, 2001.
[Chen, Chen and. Chen, 2011] P.-N. Chen, K.-Y. Chen, B. Chen, “Leveraging
relevance cues for improved spoken document retrieval,” In Proc. Interspeech,
2011.
[Chen, Chen and. Chen, 2011] B. Chen, P.-N. Chen, K.-Y. Chen, “'QUERY
MODELING FOR SPOKEN DOCUMENT RETRIEVAL”, In Proc. ASRU,
2011.
[Chiu and Chen, 2007] H.-S. Chiu and B. Chen, “Word topical mixture models for
dynamic language model Adaptation,” In Proc. ICASSP, 2007.
[Croft. and Ponte, 1998] W. B. Croft. and J. Ponte. “A language modeling approach to
information retrieval,” In Proceedings of the ACM SIGIR. 1998. pp. 275–281,
1998.
[Furnas et al. 1988] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A.
59
Harshman, L. A. Streeter, and K. E. „Lochbaum. Information retrieval using a
singular value decomposition model of latent semantic structure.‟ In SIGIR 1988
[Gauvain and Lee, 1994]., Gauvain, J. L. and Lee, C.-H., Maximum a posteriori
estimation for multivariate Gaussian mixture observations of Markov chains,
IEEE Trans. Speech Audio Process. 2 (1994), 291–298.
[Garofolo, Auzanne and Voorhees, 2000 ] J. Garofolo, G. Auzanne, and E. Voorhees,
“The TREC spoken document retrieval track: A success story,” In Proceedings of
the 9th TREC, National Institute of Standards and Technology (NIST), 2000.
[HOFFMANN, 1999] T. HOFFMANN, 1999, “Probabilistic latent semantic indexing,”
In Proc. SIGIR, pp. 50–57, 1999.
[Lavrenko and Croft, 2001] V. Lavrenko and W.B. Croft, “Relevance-based language
models,” In Proc. ACM SIGIR 2001.
[Lee and Chen, 2005] L.-S. Lee and B. Chen, “Spoken document understanding and
organization,” IEEE Signal Processing Magazine, Vol. 22, No. 5, pp. 42–60,
2005.
[Lin and Chen, 2009] S.-H. Lin and B. Chen, “Topic modeling for spoken document
retrieval using word- and syllable-level information,” In Proc. SSCS, 2009.
[Lin, Yeh and Chen, 2011] S.-H. Lin, Y.-M. Yeh and B. Chen, “Leveraging
Kullback-Leibler divergence measures and information-rich cues for speech
summarization,” IEEE Transactions on Audio, Speech and Language Processing,
19(4), pp. 871–882, 2011.
[Lu et al., 2010] Y. Lu et al., Investigating task performance of probabilistic topic
models: an empirical study of PLSA and LDA. Information Retrieval, 2010.
[Lu et al., 2010] Y. Lu, Q. Mei, and C.X. Zhai, “Investigating task performance of
probabilistic topic models – an empirical study of PLSA and LDA,” Information
Retrieval, pp. 1–26, 2010
60
[Lv and Zhai, 2009] Y. Lv and C. X. Zhai, “A comparative study of methods for
estimating query language models with pseudo feedback,” In Proc. CIKM, 2009.
[Lv. et al., 2011] Yuanhua Lv, C. X. Zhai and W. Chen, A Boosting Approach to
Improving Pseudo-Relevance. Feedback. In Proc. SIGIR, 2011
[Meij et al., 2008] E. Meij, W. Weerkamp, J. He, and M. de Rijke, “Incorporating
non-relevance information in the estimation of query models,” In Proc. 7th
TREC, 2008.
[Meij et al., 2010] E. Meij, D. Trieschnigg, M. de Rijke, and W. Kraaij, “Conceptual
language models for domain-specific retrieval,” Information Processing &
Management
[Rabiner, 2003] L. Rabiner. “The power of speech,” Science, Vol. 301, pp. 1494–1495,
2003.
[Salton, 1968] G. Salton. Automatic information organization and retrieval. New
York: McGraw-Hill, 1968
[Salton and Buckley, 1988] G. Salton and C. Buckley, “Term-weighting approaches in
automatic retrieval,” Information Processing and Management, Vol. 24, No. 5,
512–523, 1988.
[Salton and Buckley, 1990] G. Salton and C. Buckley, “Improving retrieval
performance by relevance feedback,” Journal of the American Society for
Information Science, Vol. 44, No. 4, pp. 288–297, 1990.
[Wang et al., 2007] X. Wang, H. Fang, and C. Zhai. “Improve retrieval accuracy for
difficult queries using negative feedback,” In Proceedings of the 16th CIKM
2007.
[Wang et al., 2008] X. Wang, H. Fang, and C. Zhai, “A study of methods for negative
relevance feedback,” In Proc.SIGIR, 2008.
61
[Zhai, 2008] C.X. Zhai, Statistical Language Models for Information Retrieval
(Synthesis Lectures Series on Human Language Technologies). Morgan &
Claypool Publishers, 2008.
[Zhai and Lafferty, 2001] C. Zhai and J. Lafferty. “Model-based feedback in the
language modeling approach to information retrieval,” In Proc. CIKM, 2001.
[Zhao and Yun, 2009] Zhao, J., Yun, Y. “A proximity language model for information
retrieval,” In Proc. of SIGIR 2009.