研究生: |
楊明翰 Yang, Ming-Han |
---|---|
論文名稱: |
改善類神經網路聲學模型經由結合多任務學習與整體學習於會議語音辨識之研究 Improved Neural Network Based Acoustic Modeling Leveraging Multi-task Learning and Ensemble Learning for Meeting Speech Recognition |
指導教授: |
陳柏琳
Chen, Berlin |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 95 |
中文關鍵詞: | 多任務學習 、整體學習 、深層學習 、類神經網路 、會議語音辨識 |
英文關鍵詞: | multi-task learning, ensemble learning, deep learning, neural network, meeting speech recognition |
DOI URL: | https://doi.org/10.6345/NTNU202204025 |
論文種類: | 學術論文 |
相關次數: | 點閱:189 下載:27 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文旨在研究如何融合多任務學習(multi-task learning, MTL)與整體學習(ensemble learning)技術於聲學模型之參數估測,藉以改善會議語音辨識(meeting speech recognition)之準確性。我們的貢獻主要有三點:1)我們進行了實證研究以充分利用各種輔助任務來加強多任務學習在會議語音辨識的表現。此外,我們還研究多任務與不同聲學模型像是深層類神經網路(deep neural networks, DNN)聲學模型及摺積神經網路(convolutional neural networks, CNN)結合的協同效應,期望增加聲學模型建模之一般化能力(generalization capability)。2)由於訓練多任務聲學模型的過程中,調整不同輔助任務之貢獻(權重)的方式並不是最佳的,因此我們提出了重新調適法,以減輕這個問題。3)我們對整體學習技術進行研究,有系統地整合多任務學習所培訓的各種聲學模型(weak learner)。我們基於歐盟所錄製的擴增多方互動會議語料(augmented multi-party interaction, AMI)及在台灣所收錄的華語會議語料庫(Mandarin meeting recording corpus, MMRC)建立了一系列的實驗。與數種現有的基礎實驗相比,實驗結果揭示了我們所提出的方法之有效性。
This thesis sets out to explore the use of multi-task learning (MTL) and ensemble learning techniques for more accurate estimation of the parameters involved in neural network based acoustic models, so as to improve the accuracy of meeting speech recognition. Our main contributions are three-fold. First, we conduct an empirical study to leverage various auxiliary tasks to enhance the performance of multi-task learning on meeting speech recognition. Furthermore, we also study the synergy effect of combing multi-task learning with disparate acoustic models, such as deep neural network (DNN) and convolutional neural network (CNN) based acoustic models, with the expectation to increase the generalization ability of acoustic modeling. Second, since the way to modulate the contribution (weights) of different auxiliary tasks during acoustic model training is far from optimal and actually a matter of heuristic judgment, we thus propose a simple model adaptation method to alleviate such a problem. Third, an ensemble learning method is investigated to systematically integrate the various acoustic models (weak learners) trained with multi-task learning. A series of experiments have been carried out on the augmented multi-party interaction (AMI) and Mandarin meeting recording (MMRC) corpora, which seem to reveal the effectiveness of our proposed methods in relation to several existing baselines.
[1] S. J. Young and P. C. Woodland, “The use of state tying in continuous speech recognition,” in Proceedings of the European Conference on Speech Communication and Technology, 1993.
[2] V. Valtchev, J. J. Odell, P. C. Woodland, and S. Young, “Lattice-based discriminative training for large vocabulary speech recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 1996.
[3] P. C. Woodland and D. Povey, “Large scale discriminative training of hidden Markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002.
[4] D.Povey, “Discriminative training for large vocabulary speech recognition,” Ph.D. dissertation, University of Cambridge, 2004.
[5] M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, vol. 12, no. 2, pp. 75–98, 1998.
[6] G. Ye, B. Mak, and M. W. Mak, “Fast GMM computation for speaker verification using scalar quantization and discrete densities,” in Proceedings of the International Conference on Speech Communication and Technology, 2009.
[7] E. Trentin and M. Gori, “A survey of hybrid ANN/HMM models for automatic speech recognition,” Neurocomputing, vol. 37, no. 1, pp. 91–126, 2001.
[8] M. Ostendorf, V. V. Digalakis, and O. Kimball, “A unified view of stochastic modeling for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 360–378, 1996.
[9] G. Zweig and P. Nguyen, “A segmental conditional random field toolkit for speech recognition,” in Proceedings of the International Conference on Speech Communication and Technology, 2010.
[10] M. Mohri, F. Pereira, and M. Riley, “Weighted finite-state transducers in speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 69–88, 2002.
[11] D. Yu and L. Deng, Automatic Speech Recognition: A Deep Learning Approach. Springer, 2014.
[12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[13] T. Sercu, C. Puhrsch, B. Kingsbury, and Y. LeCun, “Very deep multilingual convolutional neural networks for LVCSR,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2016.
[14] A. Graves, A. R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.
[15] J. Li, A. Mohamed, G. Zweig, and Y. Gong, “Exploring multidimensional LSTMs for large vocabulary ASR,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2016.
[16] J. M. Benyus, Biomimicry: Innovation Inspired by Nature. William Morrow and Company, 1997.
[17] M. T. Hagan, H. B. Demuth, M. H. Beale, and O. D. Jesus, Neural network design. PWS Publishing Company, 1996.
[18] D. J. Felleman and D. C. V. Essen, “Distributed hierarchical processing in the primate cerebral cortex,” Cerebral Cortex, vol. 1, no. 1, pp. 1–47, 1991.
[19] R. Caruana, “Multitask learning,” Ph.D. dissertation, University of Carnegie Mellon, 1997.
[20] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the International Conference on Machine Learning, 2008.
[21] G. Tur, “Multitask learning for spoken language understanding,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2006.
[22] Y. Huang, W. Wang, L. Wang, and T. Tan, “Multi-task deep neural network for multi-label learning,” in Proceedings of the International Conference on Image Processing, 2013.
[23] M. Seltzer and J. Droppo, “Multi-task learning in deep neural networks for improved phoneme recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.
[24] D. Wolpert and W. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, no. 1, pp. 67–82, 1997.
[25] V. Vapnik, The Nature of Statistical Learning Theory. Springer, 2000. 85
[26] S. Furui, “Generalization problem in ASR acoustic model training and adaptation,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2009.
[27] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” Journal of the Acoustical Society of America, vol. 55, no. 6, pp. 1304–1312, 1974.
[28] O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communication, vol. 25, pp. 133–147, 1998.
[29] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. New York: John & Wiley, 2000.
[30] R. Rosenfeld, “A maximum entropy approach to adaptive statistical language modeling,” Computer Speech and Language, vol. 10, no. 2, pp. 187–228, 1996.
[31] J. W. Kuo and B. Chen, “Minimum word error based discriminative training of language models,” in Proceedings of the International Conference on Speech Communication and Technology, 2005.
[32] S. M. Katz, “Estimation of probabilities form sparse data for other language component of a speech recognizer,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 35, no. 5, pp. 300–401, 1987.
[33] H. Ney, U. Essen, and R. Kneser, “On structuring probabilistic dependences in stochastic language modeling,” Computer Speech and Language, vol. 8, pp. 1–38, 1994.
[34] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transactions on Information Theory, vol. 13, no. 2, 1967.
[35] S. Ortmanns, H. Ney, and X. Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,” Computer Speech and Language, vol. 11, pp. 11– 72, 1997.
[36] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dielman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, pp. 484–503, 2016.
[37] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition,” IEEE Transactions Signal Processing Magazine, vol. 29, pp. 82–97, 2012.
[38] L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 1060–1089, 2013.
[39] E. A. Bryson and Y. C. Ho, Applied optimal control: Optimization, estimation, and control. Blaisdell Publishing Company, 1969.
[40] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.
[41] O. A. Hamid, L. Deng, and D. Yu, “Exploring convolutional neural network structures and optimization techniques for speech recognition,” in Proceedings of the International Conference on Speech Communication and Technology, 2013.
[42] O. A. Hamid, A. R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2012.
[43] K. Chellapilla, S. Puri, and P. Simard, “High performance convolutional neural networks for document processing,” in Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2006.
[44] D. C. Ciresan, U. Meier, and J. Schmidhuber, “Transfer learning for Latin and Chinese characters with deep neural networks,” in Proceedings of the International Joint Conference on Neural Networks, 2012.
[45] L. Deng, O. A. Hamid, and D. Yu, “A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.
[46] T. N. Sainath, B. Kingsbury, A. R. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, and B. Ramabhadran, “Improvements to deep convolutional neural networks for LVCSR,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2013.
[47] T. N. Sainath, A. R. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for LVCSR,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.
[48] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular in teraction, and functional architecture in the cat’s visual cortex,” Journal of Physiology, vol. 160, pp. 106–154, 1962.
[49] D. Scherer, A. Muller, and S. Behnke, “Evaluation of pooling operations in convolutional architectures for object recognition,” in Proceedings of the International Conference on Artificial Neural Networks, 2010.
[50] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: Data mining, inference and prediction. Springer, 2009.
[51] V. N. Vapnik, Statistical Learning Theory. Wiley-Interscience, 1998.
[52] T. M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[53] S. Thrun and L. Pratt, Learning to learn. Kluwer Academic Publishers, 1998.
[54] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 1345–1359, 2010.
[55] H. C. Ellis, Transfer of Learning. The Macmillan Company, 1965.
[56] E. L. Thorndike and R. S. Woodworth, “The influence of improvement in one mental function upon the efficiency of the other functions,” Psychological Review, vol. 8, pp. 247–261, 1901.
[57] J. Q. Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset Shift in Machine Learning. MIT Press, 2009.
[58] R. K. Gupta and S. D. Senturia, “Learning and evaluating classifiers under sample selection bias,” in Proceedings of the International Conference on Machine Learning, 2004.
[59] J. Huang, A. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf, “Correcting sample selection bias by unlabeled data,” Advances in Neural Information Processing Systems, vol. 19, pp. 601–608, 2007.
[60] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.
[61] A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola, “A kernel method for the two-sample problem,” Advances in Neural Information Processing Systems, vol. 19, pp. 513–520, 2007.
[62] M. Sugiyama, S. Nakajima, H. Kashima, P. V. Buenau, and M. Kawanabe, “Direct importance estimation with model selection and its application to covariate shift adaptation,” Advances in Neural Information Processing Systems, vol. 20, pp. 1433–1440, 2008.
[63] T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct importance estimation,” Journal of Machine Learning Research, vol. 10, pp. 1391–1445, 2009.
[64] L. Duan, I. W. Tsang, D. Xu, and T. S. Chua, “Domain adaptation from multiple sources via auxiliary classifiers,” in Proceedings of the International Conference on Machine Learning, 2009.
[65] C. Corinna and V. Vladimir, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
[66] J. Jiang and C. Zhai, “Instance weighting for domain adaptation in NLP,” in
Proceedings of the International Conference on Association for Computational Linguistics, 2007.
[67] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for transfer learning,” in Proceedings of the International Conference on Machine Learning, 2007.
[68] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” in Proceedings of the European Conference on Computational Learning Theory, 1995.
[69] S. J. Pan, J. T. Kwok, and Q. Yang, “Transfer learning via dimensionality reduction,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2008.
[70] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, pp. 199– 210, 2011.
[71] X. Shi, W. F. ans Q. Yang, and J. Ren, “Relaxed transfer of different classes via spectral partition,” in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery, 2009.
[72] L. Mihalkova, T. Huynh, and R. J. Mooney, “Mapping and revising Markov logic networks for transfer learning,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2007.
[73] M. Richardson and P. Domingos, “Markov logic networks,” Machine Learning Journal, vol. 62, pp. 107–136, 2006.
[74] L. Mihalkova and R. J. Mooney, “Transfer learning by mapping with minimal target data,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2008.
[75] J. Davis and P. Domingos, “Deep transfer via second-order Markov logic,” in Proceedings of the International Conference on Machine Learning, 2009.
[76] F. Li, S. J. Pan, O. Jin, Q. Yang, and X. Zhu, “Cross-domain co-extraction of sentiment and topic lexicons,” in Proceedings of the International Conference on Association for Computational Linguistics, 2012.
[77] X. Ling, G. R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu, “Can Chinese web pages be classified with English data source?” in Proceedings of the International Conference on World Wide Web, 2008.
[78] C. Wang and S. Mahadevan, “Heterogeneous domain adaptation using manifold alignment,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2011.
[79] P. Prettenhofer and B. Stein, “Cross-language text classification using structural correspondence learning,” in Proceedings of the International Conference on Association for Computational Linguistics, 2010.
[80] G. Qi, C. C. Aggarwal, and T. S. Huang, “Towards semantic knowledge propagation from text corpus to web images,” in Proceedings of the International Conference on World Wide Web, 2011.
[81] Y. Chen, O. Jin, G. R. Xue, J. Chen, and Q. Yang, “Visual contextual advertising: Bringing textual advertisements to images,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2010.
[82] B. Kulis, K. Saenko, and T. Darrell, “What you saw is not what you get: Domain adaptation using asymmetric kernel transforms,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2011.
[83] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in Proceedings of the European Conference on Computer Vision, 2010.
[84] J.Baxter, “A model of inductive bias learning,” Journal of Artificial Intelligence Research, vol. 12, pp. 149–198, 2000.
[85] S. B. David and R. Schuller, “Exploiting task relatedness for multiple task learning,” in Proceedings of the International Conference on Learning Theory, 2003.
[86] T. Kato, H. Kashima, M. Sugiyama, and K. Asai, “Multi-task learning via conic programming,” Advances in Neural Information Processing Systems, pp. 737–744, 2008.
[87] Y. Zhang and D. Yeung, “A convex formulation for learning task relationships in multi-task learning,” in Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 2010.
[88] H. Fei and J. Huan, “Structured feature selection and task relationship inference for multi-task learning,” Knowledge and Information Systems, vol. 35, no. 2, pp. 345–364, 2013.
[89] S. Parveen and P. D. Green, “Multitask learning in connectionist ASR using recurrent neural networks,” in Proceedings of the European Conference on Speech Communication and Technology, 2003, pp. 1813––1816.
[90] A. Ghoshal, P. Swietojanski, and S. Renals, “Multilingual training of deep neural networks,” in Proceedings of the International Conference on Speech Communication and Technology, 2013.
[91] J. T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, “Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers,” in Proceedings of the International Conference on Speech Communication and Technology, 2013.
[92] T. Schultz and A. Waibel, “Language-independent and language adaptive acoustic modeling for speech recognition,” Speech Communication, vol. 35, no. 1, pp. 31– 51, 2001.
[93] N. T. Vu, F. Kraus, and T. Schultz, “Cross-language bootstrapping based on completely unsupervised training using multilingual A-stabil,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2011.
[94] P. Swietojanski, A. Ghoshal and S. Renals, “Unsupervised crosslingual knowledge transfer in DNN-based LVCSR,” in Proceedings of the International Conference on Spoken Language Technology Workshop, 2012.
[95] C. Bucilua, R. Caruana, and A. N. Mizil, “Model compression,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2006.
[96] G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv:1503.02531, 2015.
[97] Z. Tang, D. Wang, and Z. Zhang, “Recurrent neural network training with dark knowledge transfer,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2016.
[98] G. Valentini and F. Masulli, “Series lecture notes in: Ensembles of learning machines,” 2002.
[99] F. Alimoglu and E. Alpaydin, “Combining multiple representations and classifiers for pen-based handwritten digit recognition,” in Proceedings of the International Conference on Document Analysis and Recognition, 1997.
[100] C. Kaynak and E. Alpaydin, “Multistage cascading of multiple classifiers: One man’s noise is another man’s data,” in Proceedings of the International Conference on Document Machine Learning, 2000.
[101] R. Jacobs, “Bias/variance analysis for mixtures-of-experts architectures,” Neural Computation, vol. 9, pp. 369–383, 1997.
[102] D. Wolpert, “Stacked generalization,” Neural Networks, vol.5, pp.241–259, 1992.
[103] A. Sankar, “Bayesian model combination (BAYCOM) for improved recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2005.
[104] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
[105] B. Efron and R. Tibshirani, An Introduction to the Boostrap. Chapman & Hall/CRC, 1993.
[106] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression
Trees. CRC Press, 1998.
[107] C. Bishop, Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1995.
[108] R. E. Schapire, “The strength of weak learn ability,” MachineLearning,vol.5,no.2, pp. 197–227, 1990.
[109] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the International Conference on Uncertainty in Machine Learning, 1996.
[110] R. E. Schapire, “The boosting approach to machine learning: An overview,” in Proceedings of the Mathematical Sciences Research Institute Workshop on Nonlinear Estimation and Classification, 2002.
[111] E. Bauer and R. Kohavi, “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants,” Machine Learning, vol. 36, no. 1, pp. 105–139, 1999.
[112] R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the margin: A new explanation of the effectiveness of voting methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651–1686, 1998.
[113] J. Carletta, Announcing the AMI meeting corpus. The ELRA Newsletter, 2006. 96
[114] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlıcek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The kaldi speech recognition toolkit,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2011.
[115] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825– 2830, 2011.
[116] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. W. Farley, and Y. Bengio, “Theano: A CPU and GPU math expression compiler,” in Proceedings of the Python for Scientific Computing Conference, 2010.
[117] F. Chollet, “Keras,” https://github.com/fchollet/keras, 2015.
[118] R. A. Gopinath, “Maximum likelihood modeling with Gaussian distributions,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 1998.
[119] L. Deng, J. Li, J. T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, and M. Seltzer, “Recent advances in deep learning for speech research at Microsoft,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.