國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	邱炫盛 Hsuan-Sheng Chiu
論文名稱：	利用主題與位置相關語言模型於中文連續語音辨識 Exploiting Topic- and Position-Dependent Language Models for Mandarin Continuous Speech Recognition
指導教授：	陳柏琳 Chen, Berlin
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2007
畢業學年度：	95
語文別：	中文
論文頁數：	147
中文關鍵詞：	語音辨識、語言模型、語言模型調適、主題相關語言模型、位置相關語言模型
英文關鍵詞：	Speech Recognition, Language Model, Language Model Adaptation, Topic-Dependent Language Model, Position-Dependent Language Model
論文種類：	學術論文
相關次數：	點閱：483 下載：17
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文探討語言模型於中文連續語音辨識。首先，詞主題混合模型(Word Topical Mixture Model, WTMM)被提出，用來探索詞與詞之間的關係，在語言模型調適中，此關係可當作是長距離的潛藏語意資訊。在語音辨識過程中，歷史詞序列可被建立成一個複合式詞主題混合模型，並用來預測新的辨識詞。此外，位置相關語言模型(Position-Dependent Language Model)亦被提出，使用詞在文件或語句的位置資訊輔助估測詞發生的可能性，並與N連詞模型及潛藏語意分析(Probabilistic Latent Semantic Analysis, PLSA)模型所提供的資訊作整合。最後，針對摘錄式摘要，我們也發展一個機率式句排名架構，其中的語句事前機率透過能夠緊密整合語句資訊的整句最大熵值(Whole Sentence Maximum Entropy, WSME)模型估測。這些資訊從語句中擷取，並可作為語音文件中重要語句的選擇依據。本論文實驗於收集自台灣的中文廣播新聞。語音辨識結果顯示，詞主題混合模型與位置相關語言模型能夠提升大詞彙連續語音辨識系統的效果。此外，語音文件摘要結果也顯示，透過整句最大熵值法整合語句層次資訊能夠提升摘要正確率。

This study investigates language modeling for Mandarin continuous speech recognition. First, a word topical mixture model (WTMM) was proposed to explore the co-occurrence relationship between words, as well as the long-span latent topical information, for language model adaptation. During Speech recognition, the search history is modeled as a composite WTMM model for predicting a newly decoded word. Second, a position-dependent language model was presented to make use of the word positional information within documents and sentences for better estimation of word occurrences. The word positional information was exploited in conjunction with that information provided by the conventional N-gram and probabilistic latent semantic analysis (PLSA) models, respectively. Finally, we also attempted to develop a probabilistic sentence-ranking framework for extractive spoken document summarization, for which the sentence prior probabilities were estimated by the whole sentence maximum entropy (WSME) language model that tightly integrated the extra information clues extracted from the spoken sentences for better selection of salient sentences of a spoken document. The experiments were conducted on Mandarin broadcast news compiled in Taiwan. The speech recognition results revealed that the word topical mixture model and positional dependent language model, respectively, could boost the performance of the baseline large vocabulary continuous speech recognition (LVCSR) system, while the spoken document summarization results also demonstrated that the integration of extra sentence-level information clues through the whole sentence maximum entropy language model could considerably raise the summarization accuracy.

第1章 序論    1
1 研究背景-語音辨識    2
2 研究內容-語言模型演進    7
3 研究內容-語言模型調適    12
4 研究成果    14
5 論文架構    14
第2章 實驗架構    17
1 台師大之大詞彙連續語音辨識系統    17
1.1 前端處理與聲學模型    17
1.2 詞典建立    18
1.3 詞彙樹複製搜尋    18
1.4 詞圖搜尋    20
2 實驗語料    20
3 語言模型評估    22
3.1 語言複雜度    22
3.2 字錯誤率    23
4 基礎實驗結果    23
第3章 語言模型應用於語音辨識    27
1 語言模型研究    27
1.1 統計式語言模型研究方向    27
1.2 語言資訊相關模型應用於語音辨識    30
2 詞相關語言模型(WORD-BASED LANGUAGE MODEL)    31
2.1 觸發對語言模型(Trigger-based Language Model)    31
2.2 混合階層馬可夫模型(Mixed-order Markov Model)    33
3 詞類別相關語言模型(WORD CLASS-BASED LANGUAGE MODEL)    35
3.1 N連類別模型(Class-based N-gram Model)    35
3.2 聚合式馬可夫模型(Aggregate Markov Model)    36
4 文件主題相關語言模型(DOCUMENT TOPIC-BASED LANGUAGE MODEL)    39
4.1 混合主題式語言模型(Mixture-based Language Model)    39
4.2 潛藏語意分析(Latent Semantic Analysis)    40
4.3 機率式潛藏語意分析(Probabilistic Latent Semantic Analysis)    45
4.4 潛藏狄利克雷分配(Latent Dirichlet Allocation)    48
5 語言資訊相關模型實驗結果    53
5.1 快取模型    53
5.2 觸發對語言模型    55
5.3 混合階層馬可夫模型    58
5.4 二連類別模型    59
5.5 聚合式馬可夫模型    60
5.6 混合主題式語言模型    61
5.7 潛藏語意分析    63
5.8 機率式潛藏語意分析    64
5.9 潛藏狄利克雷分配    66
6 本章結論    67
第4章 詞主題混合模型與位置相關語言模型    69
1 詞主題混合模型(WORD TOPICAL MIXTURE MODEL)    69
1.1 詞主題混合模型    69
1.2 詞主題混合模型與其他模型之比較    73
2 位置相關語言模型(POSITION-DEPENDENT LANGUAGE MODEL)    75
2.1 位置資訊的呈現    75
2.2 位置性N連詞模型(Positional N-gram Model)    77
2.3 位置性機率式潛藏語意分析(Positional Probabilistic Latent Semantic Analysis)    79
3 實驗結果與分析    80
3.1 詞主題混合模型    80
3.2 位置相關語言模型    89
4 本章結論    100
第5章 語言模型應用於語音文件摘要    101
1 語音文件摘要介紹    101
2 機率生成架構    103
2.1 語句生成模型    103
2.2 語句事前機率模型    106
3 摘要實驗設定與結果    110
3.1 摘要實驗語料    110
3.2 實驗評估    111
3.3 摘要實驗結果    113
4 本章結論    120
第6章 結論與未來展望    121
附錄A 變動性貝氏期望值最大化法    125
附錄B 整句最大熵值模型    131
參考文獻    133
作者相關學術著作    146
                                

參考文獻
[Afify et al. 2007] Mohamed Afify, Olivier Siohan, Ruhi Sarikaya, "Gaussian Mixture Language Models for Speech Recognition", ICASSP 2007
[Akita and Kawahara 2004] Yuya Akita, Tatsuya Kawahara, “Language Model Adaptation based on PLSA of Topics and Speakers”, ICSLP 2004
[Alpaydin 2004] Ethem Alpaydin, Introduction to Machine Learning, The MIT Press, 2004
[Atal 1974] B. S. Atal, “Effectiveness of Linear Prediction Characteristics of The Speech Wave for Automatic Speaker Identification and Verification,” Journal of the Acoustical Society of America, Vol. 55, No. 6, pp.1304-1312, 1974.
[Aubert 2002] X. Aubert, “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 16, pp. 89-114, 2002.
[Bacchiani and Roark 2003] M. Bacchiani and B. Roark. Unsupervised Language Model Adaptation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2003.
[Baeza-Yates and Ribeiro-Neto 1999] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman, 1999.
[Bahl et al. 1983] L. R. Bahl, F. Jelinek and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No.2, pp.179-190, 1983
[Bahl et al. 1986] L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” Proc. of International Conference on Acoustic, Speech and Signal Processing, 1986.
[Baum 1972] L. E. Baum, “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, Vol. 3, No. 1, pp.1-8, 1972.
[Beal 2003] Beal, M.J., Variational Algorithms for Approximate Bayesian Inference PhD. Thesis, Gatsby Computational Neuroscience Unit, University College London. 2003
[Bellegarda 1997] J. Bellegarda. A latent semantic analysis framework for large-span language modeling. In Eurospeech-97, Rhodes, Greece, September 1997.
[Bellegarda 1998] J. R. Bellegarda, “A Multispan Language modeling Framework for Large Vocabulary Speech Recognition,“ IEEE Trans. on Acoustic, Speech and Signal Processing, Vol. 6, No. 5, pp. 456-467, 1998.
[Bellegarda 2000] J. R. Bellegarda. Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, Volume 88, pages 1279-1296, August 2000.
[Bellegarda 2004] J. R. Bellegarda. Statistical language model adaptation: review and perspectives. Speech Communication, 42, 2004.
[Bellegarda 2005] Bellegarda, J. R., "Latent Semantic Mapping", IEEE Signal Processing Magazine, Vol. 22. No. 5, pp. 70- 80, 2005
[Bengio et al. 2000] Yoshua Bengio, Rejean Ducharme, and Pascal Vincent, A Neural Probabilistic Language Model. Technical Report 1178, Departement d'informatique et recherche operationnelle, Universite de Montreal, 2000.
[Bengio and Ducharme 2001] Y. Bengio and R. Ducharme, “A neural probabilistic language model,” in Advances in Neural Information Processing Systems, vol. 13. Morgan Kaufmann, 2001
[Bengio et al. 2003] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, ”A Neural Probabilistic Language Model,” Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[Berger 1997] Adam L. Berger, “The improved iterative scaling algorithm: A gentle introduction,” Tech. Rep., Carnegie Mellon University, 1997.
[Bilmes 1998] J. A. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” U.C. Berkeley TR-97-021, 1998.
[Blei et al. 2003] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation,” in Journal of Machine Learning Research, 2003.
[Blitzer et al. 2003] John Blitzer, Amir Globerson, and Fernando Pereira, Distributed Latent Variable Models of Lexical Co-occurrences. 10th International Workshop on Artificial Intelligence and Statistics - AISTATS 2005
[Brown et al. 1990] Brown, Peter, Vincent Della Pietra, Peter deSouza, and Robert Mercer, "Class-based n-gram Models of Natural Language," Proceedings of the IBM Natural Language ITL, Paris, France, pp. 283--298, 1990
[Brown et al. 1992] Peter F. Brown, Vincent J. DellaPietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. "Class-based n-gram models of natural language". Computational Linguistics, 18(4):467–479, December, 1992
[Brown et al. 1993] Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics,19(2):263–311.
[Chan and Togneri 2006] O. Chan and R. Togneri, “Prosodic Features for a Maximum Entropy Language Model”, in Proc. ICSLP 2006.
[Chen and Chen 2007] Berlin Chen, Yi-Ting Chen, "Word Topical Mixture Models for Extractive Spoken Document Summarization," IEEE International Conference on Multimedia & Expo (ICME 2007), Beijing, China, July 2-5, 2007.
[Chen and Goodman 1999] S. F. Chen, J. Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. Computer Speech and Language, 13, 1999.
[Chen et al. 2003] Lungzhou Chen, Jean-Luc Gauvuin, Lori Lamel, and Gilles Addu, "UNSUPERVISED LANGUAGE MODEL ADAPTATION FOR BROADCAST NEWS," in ICASSP 2003
[Chen et al. 2004a] B. Chen, J.-W. Kuo and W.-H. Tsai, “Lightly Supervised and Data-driven Approaches to Mandarin Broadcast News Transcription,” Proc. of International Conference on Acoustic, Speech and Signal Processing, 2004.
[Chen et al. 2004b] Berlin Chen, Hsin-min Wang, Lin-shan Lee, “A Discriminative HMM/N-Gram-Based Retrieval Approach for Mandarin Spoken Documents,” ACM Transactions on Asian Language Information Processing (ACM TALIP), Vol. 3, No. 2, June 2004, pp. 128-145
[Chen et al. 2006a] Berlin Chen, Yao-Ming Yeh, Yao-Min Huang, Yi-Ting Chen, "Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information,” the 31th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol. I, pp. 969-972, Toulouse, France, May 14-19, 2006.
[Chen et al. 2006b] Yi-Ting Chen, Suhan Yu, Hsin-min Wang, Berlin Chen, "Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models," the Fifth International Symposium on Chinese Spoken Language Processing ( ISCSLP 2006), Singapore, December 13-16, 2006.
[Chen et al. 2007] Yi-Ting Chen, Hsuan-Sheng Chiu, Hsin-Min Wang, Berlin Chen, “A Unified Probabilistic Generative Framework for Extractive Spoken Document Summarization,” the 10th European Conference on Speech Communication and Technology (Interspeech - Eurospeech 2007), Antwerp, Belgium, August 27-31, 2007.
[Chelba 1997] Chelba, Ciprian, "A Structured Language Model," in Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics 1997.
[Chelba and Jelinek 1999] C. Chelba and F. Jelinek, “Recognition performance of a structured language model,” in Proc. 6th Eur. Conf. Speech Commun. Technol., Budapest, Hungary, Sept. 1999, vol. 4, pp. 1567–1570.
[Chien et al. 2004] Jen-Tzung Chien, Meng-Sung Wu and Hua-Jui Peng, "On latent semantic language modeling and smoothing", Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 1373-1376, Jeju Island-Korea, October 2004.
[Chien et al. 2005] Jen-Tzung Chien, Meng-Sung Wu and Chia-Sheng Wu, "Bayesian learning for latent semantic analysis", Proc. of European Conference on Speech Communication and Technology (INTERSPEECH), pp. 25-28, Lisbon, September 2005.
[Chiu and Chen 2007] Hsuan-Sheng Chiu, Berlin Chen, "Word Topical Mixture Models for Dynamic Language Model Adaptation," the 32th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Hawaii, USA, April 15-20, 2007.
[Clarkson and Robinson 1997] P. R. Clarkson and A. J. Robinson, “Language Model Adaptation using Mixtures and an Exponentially Decaying Cache,” in Proc. of ICASSP, vol. 2, 1997.
[CNA News] Central News Agency, http://www.cna.com.tw/
[Collins 2000] Collins, M, "Discriminative reranking for natural language parsing," In ICML 2000.
[Croft and Lafferty 2003] Croft, W.B., Lafferty, J. (Eds.). Language Modeling for Information Retrieval. Kluwer Academic Publishers (2003)
[Darroch and Ratcliff 1972] Darroch, J. N. and Ratcliff, D. Generalized Iterative Scaling for Log-linear Models. Annals of Mathematical Statistics, no. 43, 1470-1480, 1972
[Davis and Mermelstein 1980] S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. Acoustic, Speech, and Signal Processing, Vol. 28, No. 4, pp.357-366, 1980.
[Deerwester et al. 1990] S. Deerwester, Susan Dumais, G. W. Furnas, T. K. Landauer, R. Harshman. "Indexing by Latent Semantic Analysis". Journal of the Society for Information Science 41 (6): 391-407. 1990
[Dempster et al. 1977] A.P. Dempster, N. M. Laird, D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of Royal Statistical Society B, Vol. 39, No. 1, 1977.
[Duda and Hart 1973] R. O. Duda and P. B. Hart, “Pattern Classification And Scene Analysis,” John Wiley and Sons, 1973.
[ETtoday News] ETtoday News, http://www.ettoday.com/
[Foster 2000] George Foster, “Incorporating position information into a maximum entropy/minimum divergence translation model”, In Proc. of CoNNL-2000 and LLL-2000, pages 37–52, Lisbon, Portugal, 2000
[Gao et al. 2005] Gao. J., Yu, H., Yuan, W., and Xu, P., "Minimum sample risk methods for language modeling," In HLT/EMNLP 2005.
[Glidea and Hofmann 1999] D. Gildea and T. Hofmann, “Topic-based language models using EM,” in Proc. of Eurospeech, 1999.
[Goodman 2001] J. Goodman. A Bit of Progress in Language Modeling Extended Version. Microsoft Research, Machine Learning and Applied Statistics Group, Technique Report, 2001.
[Google 2006] http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
[Gopinath 1998] R. A. Gopinath, “Maximum Likelihood Modeling with Gaussian Distributions,” Proc. of International Conference on Acoustic, Speech and Signal Processing, 1998.
[Griffiths and Steyvers 2004] T. Griffiths and M. Steyvers, "Finding Scientific Topics," In Proc. National Academy of Science, 101(Suppl. 1):5228-5235, 2004
[Gruber et al. 2007] Amit Gruber, Michal Rosen-Zvi and Yair Weiss, "Hidden Topic Markov Models",In Artificial Intelligence and Statistics (AISTATS), San Juan, Puerto Rico, March 2007.
[Gunawardana et al. 2005] Asela Gunawardana, Milind Mahajan, Alex Acero, John C. Platt, “Hidden Conditional Random Fields for Phone Classification”, in Interspeech 2005
[Ho 2003] Y. Ho,” An initial study on automatic summarization of Chinese spoken documents”, Master Thesis, National Taiwan University, July 2003.
[Hofmann 1999] Thomas Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Arti¯cial Intelligence, UAI'99, Stockholm, 1999.
[Hsu and Glass 2006] Bo-June (Paul) Hsu, James Glass, "Style & Topic Language Model Adaptation Using HMM-LDA". In Proc. Empirical. Methods in Natural Language Processing (EMNLP), Sydney, Australia, July 2006
[Huang et al. 1993] X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. 1993. The SPHINX-II speech recognition system: An overview. Computer, Speech, and Language, 2:137–148. 1993
[Jelinek 1999] F. Jelinek, “Statistical Methods for Speech Recognition,” the MIT press,1999.
[Juang and Katagiri 1992] B.-H. Juang and S. Katagiri, “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Signal Processing, Vol. 40, No. 12, pp. 3043-3054, 1992
[Katz 1987] S. M. Katz. Estimation of Probabilities from Sparse Data for the Language Model Component of A Speech Recognizer. IEEE Trans. On Acoustics, Speech and Signal Processing, Volume 35 (3), pages 400-401, March 1987.
[Kikuchi et al. 2003] T. Kikuchi, S. Furui, and C. Hori, “Two-stage automatic speech summarization by sentence extraction and compaction,” in Proc. IEEE and ISCA Workshop on Spontaneous Speech Processing and Recognition, 2003, pp.207-210.
[Kneser and Ney 1995] Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 181–184.
[Korkmazsky et al. 2004] F. Korkmazsky, D. Fohr and I. Illina, “Using Linear Interpolation to Improve Histogram Equalization for Speech Recognition,” Proc. of International Conference on Spoken Language Processing, 2004.
[Kuhn 1988] R. Kuhn, “Speech recognition and the frequency of recently used words: A modified markov model for natural language”. In 12th International Conference on Computational Linguistics, pages 348–350, Budapest, August, 1988
[Kuhn and Mori 1990] R. Kuhn and R. De Mori. A cache-based natural language model for speech reproduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6):570–583, 1990
[Kuo and Chen 2005] Jen-Wei Kuo, Berlin Chen, "Minimum Word Error Based Discriminative Training of Language Models," the 9th European Conference on Speech Communication and Technology (Interspeech - Eurospeech 2005), pp. 1277-1280, Lisbon, Portugal, September 4-8, 2005.
[Kuo and Gao 2004] H.-K. J. Kuo and Y. Gao, “Maximum entropy direct model as a unified direct model for acoustic modeling in speech recognition,” in ICSLP, 2004.
[Lau et al. 1993] Raymond Lau, Ronald Rosenfeld and Salim Roukos. Trigger-Based Language Models: a Maximum Entropy Approach. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages II 45–48, Minneapolis, MN, April 1993.
[LDC] Linguistic Data Consortium: http://ldc.upenn.edu/.
[Lee and Chen 2005] Lin-shan Lee and Berlin Chen, “Spoken Document Understanding and Organization,” IEEE Signal Processing Magazine (IEEE SPM), Vol. 22, No. 5, Sept. 2005, pp. 42-60.
[Lin 2004] Chin-Yew Lin, "ROUGE: a Package for Automatic Evaluation of Summaries", In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26, 2004.
[Lin et al. 2006] Lin, S. H., Y. M. Yeh, et al. "Exploiting Polynomial-Fit Histogram Equalization and Temporal Average for Robust Speech Recognition," Interspeech'2006 - 9th International Conference on Spoken Language Processing (ICSLP), Pittsburgh, Pennsylvania, 2006
[Manning and Schutze 1999] C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
[Mori and Takuma 2004] S. Mori and D. Takuma, “Word N-gram Probability Estimation From A Japanese Raw Corpus,” in Proc. of ICSLP 2004, pp. 201–207.
[Mrva and Woodland 2004] D. Mrva and P. C. Woodland, “A PLSA-based Language Model for Conversational Telephone Speech,” in Proc. ICSLP2004.
[Mrva and Woodland 2006] David Mrva and Philip C. Woodland ,"Unsupervised Language Model Adaptation for Mandarin Broadcast Conversation Transcription," in ICSLP 2006
[News 98] http://www.news98.com.tw/
[Niesler and Willett 2002] T. Niesler and D. Willett, “Unsupervised language model adaptation for lecture speech transcription,” Proc. ICSLP2002, Denver, pp.1413–1416, 2002.
[Novak and Mammone 2001] M. Novak, R. Mammone, “Use of Non-negative Matrix Factorization for Language Model. Adaptation in a Lecture Transcription Task”, ICASSP, Salt Lake City, UT, USA, 2001
[Olsen et al. 2006] Jesper Olsen, Daniela Oria, Nokia, Finland, "PROFILE BASED COMPRESSION OF N-GRAM LANGUAGE MODELS" in ICASSP 2006
[Ortmanns et al. 1997] S. Ortmanns, H. Ney and X. L. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,“ Computer Speech and Language, Vol. 11, pp.43-72, 1997.
[Povey 2004] D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition,” Ph.D Dissertation, Peterhouse, University of Cambridge, July 2004.
[PTS] Public Television Service Foundation. http://www.pts.org.tw.
[Rabiner 1989] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Procedings of the IEEE, Vol. 77, No. 2, 1989.
[Roark et al. 2004] Roark, B., Saraclar, M., and Collins, M., "Corrective language modeling for large vocabulary ASR with the perceptron algorithm," In ICASSP 2004. 749-752.
[Rosenfeld 1994] Ronald Rosenfeld, Adaptive Statistical Language Modeling: A Maximum Entropy Approach. Ph.D. thesis, Carnegie Mellon University, April 1994. Also published as Technical Report CMUCS-94-138, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, April 1994.
[Rosenfeld 1996] R. Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech, and Language, 10, 1996.
[Rosenfeld 1997] R. Rosenfeld. A whole sentence maximum entropy language model. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, 1997.
[Rosenfeld 2000] R. Rosenfeld. Two Decades of Statistical Language Modeling: Where Do We Go from Here. In Proceedings IEEE, Volume 88, no. 8, pages 1270-1278, 2000.
[Rosenfeld et al. 1999] Ronald Rosenfeld, Larry Wasserman, Can Cai, and Xiaojin Zhu. Interactive feature induction and logistic regression for whole sentence exponential language models. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, December 1999.
[Rosenfeld et al. 2001] R. Rosenfeld, S. F. Chen, and X. Zhu, “Whole-sentence exponential language models: A vehicle for linguistic-statistical integration,” Computer Speech and Language 15(1), 2001.
[Rosen-Zvi et al. 2004] Michal Rosen-Zvi, Tom Griffiths, Mark Steyvers and Padhraic Smyth, "The Author-Topic Model for Authors and Documents", Proceedings of the Conference on Uncertainty in Artificial Intelligence volume 21, 2004
[Saon et al. 2000] G. Saon, M. Padmanabhan, R. Gopinath and S. Chen, “Maximum Likelihood Discriminant Feature Spaces,” Proc. of International Conference on Acoustic, Speech and Signal Processing, 2000.
[Saul and Pereira 1997] L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of EMNLP, 1997.
[Schwenk and Gauvain 2002] Holger Schwenk and Jean-Luc Gauvain. "Connectionist Language Modeling for Large Vocabulary Continuous Speech Recognition". In Proceedings of ICASSP, pages 765-768, Orlando, May 2002.
[Singh-Miller and Collins 2007] Natasha Singh-Miller and Michael Collins, "Trigger-based Language Modeling using a Loss-sensitive Perceptron Algorithm," in ICASSP 2007
[SLG] Spoken Language Group at Chinese Information Processing Laboratory, Institute of Information Science, Academia Sinica. http://sovideo.iis.sinica.edu.tw/SLG/.
[Smucker et al. 2005] M. D. Smucker et al., “Dirichlet Mixtures for Query Estimation in Information Retrieval,” CIIR Technical Report, Center for Intelligent Information Retrieval, University of Massachusetts (2005)
[SRILM] A. Stolcke. SRI Language Modeling Toolkit. version 1.5.2, http://www.speech.sri.com/projects/srilm/
[Tam and Schultz 2005] Y. C. Tam and T. Schultz, “Language model adaptation using variational bayes inference,” in Proc. of Interspeech, 2005.
[Tillmann and Ney 1996] C. Tillmann, H. Ney, “Selection Criteria for Word Trigger Pairs in Language Modeling,” International Colloquium on Grammatical Inference, pp. 95-106, 1996.
[Troncoso et al. 2004] C. Troncoso, T. Kawahara, H. Yamamoto and G. Kikui, “Triggerbased language model construction by combining different corpora,” IEICE Technical Report, SP2004-100, 2004.
[Tur and Stolcke 2007] Gokhan Tur and Andreas Stolcke, "UNSUPERVISED LANGUAGE MODEL ADAPTATION FOR MEETING RECOGNITION," in ICASSP 2007
[Viikki and Laurila 1998] O. Viikki and K. Laurila, “Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition,” Speech Communication, Vol. 25, pp. 133-147, 1998.
[Viterbi 1967] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Information Theory, Vol. 13, No. 2, 1967.
[Wang et al. 2001] S. Wang, R. Rosenfeld and Y. Zhao, “Latent maximum entropy principle for statistical language modeling,” IEEE Workshop on Automatic Speech Recognition and Understanding, December 2001
[Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, 2005.
[Whittaker and Woodland 2001] E. W D. Whittaker and P. C. Woodland, "EFFICIENT CLASS-BASED LANGUAGE MODELLING FOR VERY LARGE VOCABULARIES," in ICASSP 2001
[Witten and Bell 1991] Ian H. Witten and Timothy C. Bell. 1991. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transaction on Information Theory, 37(4):1085-1094.
[Xu et al. 2003] P. Xu, A. Emami, and F. Jelinek. Training connectionist models for the structured language model. In Proceedings of EMNLP 2003.
[Zhu and Rosenfeld 2001] X. Zhu and R. Rosenfeld. Improving trigram language modeling with the world wide web. In Proc. ICASSP, pages I:533–536, 2001.
[陳怡婷 2006] 陳怡婷, "中文語音資訊摘要－模型與特徵之改進," 國立台灣師範大學資訊工程所碩士論文, 2006.
[蔡文鴻 2005] 蔡文鴻, "語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究," 國立台灣師範大學資訊工程所碩士論文, 2005

簡易檢索 / 詳目顯示

相關論文