研究生: |
劉家妏 |
---|---|
論文名稱: |
多種鑑別式語言模型應用於語音辨識之研究 Exploiting Discriminative Language Models for Speech Recognition |
指導教授: | 陳柏琳 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 70 |
中文關鍵詞: | 語音辨識 、語言模型 、鑑別式語言模型 、重新排序 |
論文種類: | 學術論文 |
相關次數: | 點閱:164 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
N連(N-gram)語言模型在語音辨識器中扮演著關鍵性的角色,因為它可幫助辨識器從其大量輸出的候選詞序列中,區分出正確與非正確的候選詞序列。然而,因N連語言模型的訓練目標為最大化訓練語料的機率,而不是以最佳化語音辨識評估量為目標,導致在語音辨識效能表現上有所侷限。本論文我們首先探討多種基於不同訓練目標的鑑別式語言模型(Discriminative Language Model, DLMs)。鑑別式語言模型的根本精神即為直接提昇語音辨識效能;接著會比較它們在理論與實際上運用在大詞彙語音辨識上的表現。另外,我們也提出語句相關之鑑別式語言模型(Utterance-driven Discriminative Language Model, UDLM),此語言模型可考慮測試語句的特性,並即時估計其模型參數。最後,我們將最大化事後機率法(Maximum a Posterior, MAP)結合語句相關之鑑別式語言模型,期望最大化事後機率法所產生的辨識結果,能幫助語句相關之鑑別式語言模型獲致更顯著的語音辨識率提昇。本論文的實驗皆建立在臺灣中文廣播新聞語料上,實驗結果顯示本論文所提出之作法可獲得一定的語音辨識率提升。
N-gram language modeling is a crucial component in any speech recognizer since it is expected to help the recognizer distinguish the correct hypothesis from the other incorrect ones in an extremely large output space of the recognizer. However, the N-gram language models are inadequate since they usually set the goal of training at maximizing the likelihood of a large amount of training text, but not at optimizing the final performance measure of speech recognition. In this thesis, we first investigate a wide variety of discriminative language models (DLMs), which have their roots stemming from different training objectives but are consistent with the intuition of enhancing recognition performance. The utilities of these DLMs are compared both theoretically and empirically. Further, we also propose a test utterance-driven DLM (UDLM) that can efficiently infer its model parameters on-the-fly and accommodate itself well to speech recognition applications. As a final point, we pair UDLM with the maximum a posteriori probability (MAP) language model adaptation approach for better recognition performance. All experiments are conducted on a Mandarin broadcast news corpus compiled in Taiwan, and the associated results seem to demonstrate the feasibility of the proposed methods.
[Arisoy et al. 2010] E. Arisoy, M. Saraclar, B. Roark, and I. Shafran, “Syntactic and sub-lexical features for Turkishi discriminative language models,” ICASSP, 2010.
[Bahl et al. 1986] L.R. Bahl, P.F. Brown, P.V. de Souza, and L.R. Mercer, “Maximum mutual information estimation of Hidden Markov Model parameters for speech recognition,” ICASSP, 1986.
[Bahl et al. 1983] L. R. Bahl, F. Jelinek and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1983.
[Bellegarda 2005] J. R. Bellegarda, “Latent Semantic Mapping,” IEEE Signal Processing Magazine, Vol. 22. No. 5, pp. 70- 80, 2005.
[Brown et al. 1992] P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai, and R. L. Mercer. “Class-based n-gram models of natural language,” Computational Liguistics, 1992.
[Chelba and Jelinek 2000] C. Chelba and F. Jelinek, “Structured language modeling,” Computer Speech and Language, 2000.
[Chen et al. 2004] B. Chen, J.-W. Kuo and W.-H. Tsai, “Lightly supervised and data-driven approaches to mandarin broadcast news transcription,” ICASSP, 2004.
[Clarkson and Robinson 1997] P. R. Clarkson and A. J. Robinson, “Language model adaptation using mixtures and an exponentially decaying cache,” ICASSP, 1997.
[CNA News] Central News Agency, http://www.cna.com.tw/.
[Collins and Koo 2000] M. Collins and T. Koo, “Discriminative reranking for natural language parsing,” ICML, 2000.
[Collins 2002] M. Collins, “Discriminative training methods for Hidden Markov Models: theory and experiments with perceptron algorithms,” EMNLP, 2002.
[Collins et al. 2005] M. Collins, B. Roark, and M. Sraclar, “Discriminative syntactic language modeling for speech recognition,” ACL, 2005.
[Freund and Schapire 1999] Y. Freund and R. Schapire, “Large margin classification using the perceptron algorithm,” Machine Learning, 277-296, 1999.
[Gao et al. 2005] J. Gao, H. Suzuki, and W. Yuan, “An empirical study on language model adaptation,” TALIP, 2005.
[Gildea and Hofmann 1999] D. Gildea and T. Hofmann, “Topic-based language models using EM,” Eurospeech, 1999.
[Gopalakrishnan et al. 1991] P.S. Gopalakrishnan, D. Kanevsky, D. Nahamoo, and A. Nadas, “An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems,” IEEE Trans. Information Theory, vol. 37, no. 1, 1991.
[Huang et al. 2010] J.-T. Huang, X. Li, and A. Acero, “Discriminative training methods for language models using conditional entropy criteria,” ICASSP, 2010.
[Juang and Katagiri 1992] B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, Vol. 40, No. 12, pp. 3043-3054, 1992.
[Jelinek 1999] F. Jelinek, “Statistical methods for speech recognition,” the MIT Press, 1999.
[Johnson et al. 1999] M. Johnson, S. Geman, S. Canon, Z. Chi, and S. Riezler, “Estimators for stochastic “unification-based” grammars,” ACL, 1999.
[Kaufmann et al. 2009] T. Kaufmann, T. Ewender, and B. Pfister, “Improving broadcast news transcription with a precision grammar and discriminative reranking,” Interspeech, 2009.
[Kobayashi et al. 2008] A. Kobayashi, T. Oku, S. Homma, S. Sato, T. Imai, and T. Takagi, “Discriminative rescoring based on minimization of word errors for transcribing broadcast news,” Interspeech, 2008.
[Kuo et al. 2002] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang and C. H. Lee, “Discriminative training of language models for speech recognition,” ICASSP,2002.
[Kuo and Chen 2005] J. W. Kuo and B. Chen, “Minimum word error based discriminative training of language models,” Eurospeech, 2005.
[Lafferty et al. 2001] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” ICML, 2001.
[Magdin and Jiang 2009] V. Magdin and H. Jiang, “Discriminative training of n-gram language models for speech recognition via linear programming,” ASRU, 2009.
[Magdin and Jiang 2010] V. Magdin and H. Jiang, “Large margin of n-gram language model for speech recognition via linear programming,” ICASSP, 2010.
[Oba et al. 2010] T. Oba, T. Hori and A. Nakamura, “A comparative study on methods of weighted language model training for reranking LVCSR n-best hypotheses,” ICASSP, 2010.
[Och 2003] F. J. Och, “Minimum error rate training in statistical machine translation,” ACL, 2003.
[Ortmanns et al. 1997] S. Ortmanns, H. Ney and X. L. Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,“ Computer Speech and Language, Vol. 11, pp.43-72, 1997.
[Povey 2004] D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition,” Ph.D Dissertation, Peterhouse, University of Cambridge, July 2004.
[Rastrow et al. 2009] A. Rastrow, A. Sethy, and B. Ramabhadran, “Constrained discriminative training of n-gram language models,” ASRU, 2009.
[Ratnaparkhi et al. 1994] A. Ratnaparkhi, S. Roukos, and R. T. Ward, “A maximum entropy model for parsing,” ICSLP, 1994.
[Rigazio et al. 1998] L. Rigazio, J.-C. Junqua, and M. Galler, “Multilevel discriminative training for spelled word recognition,” ICASSP, 1998.
[Roark et al. 2004a] B. Roark, M. Saraclar, M. Collins, and M. Johnson, “Corrective langrage modeling for large vocabulary ASR with the perceptron algorithm,” ICASSP, 2004.
[Roark et al. 2004b] B. Roark, M. Saraclar, M. Collins, and M. Johnson, “Discriminative language modeling with conditional random fields and the perceptron algorithm,” ACL, 2004.
[Roark et al. 2007] B. Roark, M. Saraclar, and M. Collins, “Discriminative n-gram language modeling,” Computer Speech and Language, vol. 21, no. 2, pp. 373-392, 2007
[Roark 2009] B. Roark, “A survey of discriminative language modeling approaches for Large Vocabulary Continuous Speech Recognition,” Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, 2009.
[Rosenblatt 1958] F. Rosenblatt, “A probabilistic model for information storage and organization in the brain,” Psychological Review, 65, 386-408, 1958.
[Saul and Pereira 1997] L. Saul and F. Pereira, “Aggregate and mixed-order Markov models for statistical language processing,” EMNLP, 1997.
[Singh-Miller and Collins 2007] N. Singh-Miller and M. Collins, “Trigger-based language modeling using a loss-sensitive perceptron algorithm,” ICASSP, 2007.
[SRI] A. Stolcke, SRI Language Modeling Toolkit, version 1.5.8, http://www.speech.sri.com/projects/srilm/
[Tam and Schultz 2005] Y. C. Tam and T. Schultz, “Language model adaptation using variational bayes inference,” Interspeech, 2005.
[Troncoso et al. 2004] C. Troncoso, T. Kawahara, H. Yamamoto, and G. Kikui, “Trigger-based language model construction by combining different corpora,” IEICE Technical Report, 2004.
[Warnke et al. 1999] V. Warnke, S. Harbeck, E. Noth, H. Niemann and M. Levit, “Discriminative estimation of interpolation parameters for language model classifiers,” ICASSP, 1999.
[Xu et al. 2009] P. Xu, D. Karakos, and S. Khudanpur, “Self-supervised discriminative training of statistical language models,” ASRU, 2009.
[Zhou and Meng 2008] Z. Zhou and H. Meng, “Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR,” ICASSP, 2008.
[Zhou et al. 2006] Z. Zhou, J. Gao, F. K. Soong, and H. Meng, “A comparative study of discriminative methods for reranking lvcsr n-best hypotheses in domain adaptation and generalization,” ICASSP, 2006.
[劉鳳萍 2009] 劉鳳萍,“使用鑑別式語言模型於語音辨識結果重新排序,”國立臺灣師範大學資訊工程所碩士論文,2009。
[劉鳳萍等人 2009] 劉鳳萍,陳冠宇,劉家妏,張鈺玫,陳柏琳,“鑑別式語言模型調適法於大詞彙連續語音辨識之研究,” TAAI,2009。
[邱炫盛 2007] 邱炫盛 “利用主題與位置相關語言模型於中文連續語音辨識,”國立臺灣師範大學資訊工程所碩士論文,2007。
[邱炫盛等人 2007] 邱炫盛,羅永典,陳韋豪,陳柏琳,“使用位置資訊於中文連續語音辨識,” TAAI,2007。