研究生: |
蘇鈺婷 |
---|---|
論文名稱: |
以噪音分類為基礎之深度學習噪音消除法提升人工電子耳使用者之語音理解度表現 A noise classification-based deep learning noise reduction approach to improve speech intelligibility for cochlear implant recipients |
指導教授: |
葉榮木
Yeh, Zong-Mu 賴穎暉 Lai, Ying-Hui |
學位類別: |
碩士 Master |
系所名稱: |
機電工程學系 Department of Mechatronic Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 74 |
中文關鍵詞: | 人工電子耳 、噪音消除 、DDAE 、噪音分類器 、深度學習 |
英文關鍵詞: | cochlear implant, noise reduction, deep denoising autoencoder, noise classification, deep learning |
DOI URL: | https://doi.org/10.6345/NTNU202202926 |
論文種類: | 學術論文 |
相關次數: | 點閱:156 下載:14 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
人工電子耳(cochlear implant, CI)是現今唯一可幫助全聾患者重新聽見聲音的重要科技。於過去的研究證明人工電子耳於安靜的溝通環境下能有效的幫助患者提升語音理解能力。但在噪音環境下,其效益仍存在許多改進空間,並期望能發展出更有效的訊號處理來提升使用者之滿意度。近年,一個基於深度學習理論所發展出的噪音消除方法被提出,即是 deep denoising autoencoder(DDAE)。其研究成果證明,DDAE 噪音消除法在人工電子耳模擬測試下,有顯著的語音理解力的改善效益。但對於真實人工電子耳使用者來說,其 DDAE 之效益仍未有研究證據。有鑑於此,本論文將基於 DDAE 噪 音 消 除 法 進 行 改 良 , 並 提 出 一 個 新 的 噪 音 消 除 方 法 , 稱 noise classification+DDAE (NC+DDAE)。此外,也將所提出之方法進行真實人工電子耳使用者之臨床效益驗證。從客觀之聲電指標驗證及語音聽辨力測試結果發現,在噪音環境下,NC+DDAE 能比兩個常見的傳統噪音消除法(logMMSE, KLT)有更佳之語音理解力表現,特別是噪音是己知情況。更具體的來說,當噪音情境是已知時,其 NC+DDAE 分別在不同測試條件下能比其他方法最多提升了 41.5 %之語音理解度表現;當噪音情境是未知的情況下其 NC+DDAE 能比其他方法最多提升了 17.5 %之語音理解度表現。有鑑於上述之結果證明,本論文所提出之 NC+DDAE 噪音消除法將能有效的提升人工電子耳使用者於噪音情境下之聆聽效益。
Cochlear implant (CI) is the only technology to help deaf hearing loss individual to hear sound again. Previous studies demonstrate that the CI technologies has enabled many CI users to enjoy a high level of speech understanding in quiet; however, for the most CI users, listening under noisy conditions remains challenging and desire the efficient signal processing be proposed to overcome this issue. More recently, deep learning-based NR approach, called deep denoising autoencoder (DDAE), have been proposed and confirmed to be effective in various NR tasks. In addition, the previous study indicated that the DDAE-based NR approach yielded higher intelligibility scores than those obtained with conventional NR techniques in CI simulation; however, the efficacy of the DDAE NR approach for real CI recipients remains unevaluated. In view of this, this study further to evaluate the performance of DDAE-based NR in real CI subject. In addition, a new DDAE-based NR model, called NC+DDAE, has been proposed in this study to further improve the intelligibility performance for CI users. The experimental results of objective evaluation and listening test indicate that, under challenging listening conditions, the proposed NC+DDAE NR approach yields higher intelligibility scores than two classical NR techniques (i.e., logMMSE, KLT), especially under match training condition. More specifically, the NC+DDAE improve speech recognition up to 41.5 % and 17.5 % at most under test conditions when the noise has ever been provided and never provided in training phase. The present study demonstrates that, under challenging listening conditions, the proposed NC+DDAE NR approach could improve speech recognition more effectively when compared to conventional NR techniques. Furthermore, the results shows that NC+DDAE has superior noise suppression capabilities, and provides less distortion of speech envelope information for Mandarin CI recipients, compared to conventional techniques. Therefore, the proposed NC+DDAE NR approach can potentially be integrated into existing CI processors to overcome the degradation of speech perception caused by noise.
[1] 行政院主計總處, "105 年 9 月底領有身心障礙手冊人數統計," 2017,
Available:
https://www.stat.gov.tw/public/Data/7120162454CA7EZUC2.pdf.
[2] WHO, "Deafness and hearing loss," 2017, Available:
http://www.who.int/mediacentre/factsheets/fs300/en/.
[3] M. Bansal, Diseases of ear, nose and throat. JP Medical Ltd, 2012.
[4] J. G. Clark, "Uses and abuses of hearing loss classification," Asha, vol. 23,
no. 7, p. 493, 1981.
[5] F. Chen, Y. Hu, and M. Yuan, "Evaluation of Noise Reduction Methods for
Sentence Recognition by Mandarin-Speaking Cochlear Implant Listeners,"
Ear and hearing, vol. 36, no. 1, pp. 61-71, 2015.
[6] P. Loizou, "Speech processing in vocoder-centric cochlear implants," in
Cochlear and brainstem implants, vol. 64: Karger Publishers, 2006, pp.
109-143.
[7] K. Nie, G. Stickney, and F.-G. Zeng, "Encoding frequency modulation to
improve cochlear implant performance in noise," IEEE Transactions on
Biomedical Engineering, vol. 52, no. 1, pp. 64-73, 2005.
[8] M. W. Skinner, P. L. Arndt, and S. J. Staller, "Nucleus® 24 Advanced
Encoder conversion study: Performance versus preference," Ear and
Hearing, vol. 23, no. 1, pp. 2S-17S, 2002.
[9] I. S. Kerber and I. B. U. Seeber, "Sound localization in noise by normal-
hearing listeners and cochlear implant users," Ear and hearing, vol. 33, no.
4, p. 445, 2012.
[10] L. S. Eisenberg et al., "Sentence recognition in quiet and noise by pediatric
cochlear implant users: Relationships to spoken language," Otology &
Neurotology, vol. 37, no. 2, pp. e75-e81, 2016.
[11] A. Rezayee and S. Gazor, "An adaptive KLT approach for speech
enhancement," IEEE Transactions on Speech and Audio Processing, vol. 9,
no. 2, pp. 87-95, 2001.
[12] Y. Hu and P. C. Loizou, "A generalized subspace approach for enhancing
speech corrupted by colored noise," IEEE Transactions on Speech and
Audio Processing, vol. 11, no. 4, pp. 334-341, 2003.
[13] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-
square error log-spectral amplitude estimator," IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443-445, 1985.
[14] S. Kamath and P. Loizou, "A multi-band spectral subtraction method for
enhancing speech corrupted by colored noise," in ICASSP, 2002, vol. 4, pp.
44164-44164: Citeseer.
[15] P. Scalart, "Speech enhancement based on a priori signal to noise
estimation," in Acoustics, Speech, and Signal Processing, 1996. ICASSP-
96. Conference Proceedings., 1996 IEEE International Conference on,
1996, vol. 2, pp. 629-632: IEEE.
[16] G. S. Stickney, F.-G. Zeng, R. Litovsky, and P. Assmann, "Cochlear implant
speech recognition with speech maskers," The Journal of the Acoustical
Society of America, vol. 116, no. 2, pp. 1081-1091, 2004.
[17] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "A regression approach to speech
enhancement based on deep neural networks," IEEE/ACM Transactions on
Audio, Speech and Language Processing (TASLP), vol. 23, no. 1, pp. 7-19,
2015.
[18] G. Hinton et al., "Deep neural networks for acoustic modeling in speech
recognition: The shared views of four research groups," IEEE Signal
Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
[19] P. Y. Simard, D. Steinkraus, and J. C. Platt, "Best Practices for
Convolutional Neural Networks Applied to Visual Document Analysis," in
ICDAR, 2003, vol. 3, pp. 958-962: Citeseer.
[20] D. Ciregan, U. Meier, and J. Schmidhuber, "Multi-column deep neural
networks for image classification," in Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 3642-3649:
IEEE.
[21] Y. Xu, Q. Huang, W. Wang, and M. D. Plumbley, "Hierarchical learning for
DNN-based acoustic scene classification," arXiv preprint
arXiv:1607.03682, 2016.
[22] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancement based on
deep denoising autoencoder," in Interspeech, 2013, pp. 436-440.
[23] Y.-H. Lai, F. Chen, S.-S. Wang, X. Lu, Y. Tsao, and C.-H. Lee, "A Deep
Denoising Autoencoder Approach to Improving the Intelligibility of
Vocoded Speech in Cochlear Implant Simulation," IEEE Transactions on
Biomedical Engineering, 2016.
[24] A. Moctezuma and J. Tu, "An overview of cochlear implant systems," BIOE,
vol. 414, pp. 1-20, 2011.
[25] P. C. Loizou, "Introduction to cochlear implants," IEEE Engineering in
Medicine and Biology Magazine, vol. 18, no. 1, pp. 32-42, 1999.
[26] A. S.-L.-H. Association, "Type, Degree, and Configuration of Hearing
Loss," 2015.
[27] B. C. Papsin and K. A. Gordon, "Cochlear implants for children with
severe-to-profound hearing loss," New England Journal of Medicine, vol.
357, no. 23, pp. 2380-2387, 2007.
[28] Cochlear's- implant portfolio. Available:
http://www.cochlear.com/wps/wcm/connect/au/home/discover/cochlear-
implants/the-nucleus-6-system/cochlears-implant-portfolio
[29] I. J. Hochmair‐Desoyer, E. S. Hochmair, and K. Burian, "DESIGN AND
FABRICATION OF MULTIWIRE SCALA TYMPANI ELECTRODESa,"
Annals of the New York Academy of Sciences, vol. 405, no. 1, pp. 173-182,
1983.
[30] M. W. Skinner et al., "Evaluation of a new spectral peak coding strategy for
the Nucleus 22 Channel Cochlear Implant System," Otology & Neurotology,
vol. 15, pp. 15-27, 1994.
[31] M. VONDRÁŠEK, P. Sovka, and T. TICHÝ, "ACE Strategy with Virtual
Channels," Radioengineering, vol. 17, no. 4, 2008.
[32] A. C. S. Kam, I. H. Y. Ng, M. M. Y. Cheng, T. K. C. Wong, and M. C. F.
Tong, "Evaluation of the ClearVoice strategy in adults using HiResolution
fidelity 120 sound processing," Clinical and experimental
otorhinolaryngology, vol. 5, no. Suppl 1, p. S89, 2012.
[33] G. Clark, Cochlear implants: fundamentals and applications. Springer
Science & Business Media, 2006.
[34] P. P. Khing, B. A. Swanson, and E. Ambikairajah, "The effect of automatic
gain control structure and release time on cochlear implant speech
intelligibility," PloS one, vol. 8, no. 11, p. e82263, 2013.
[35] P. J. Blamey, "Adaptive dynamic range optimization (ADRO): a digital
amplification strategy for hearing aids and cochlear implants," Trends in
amplification, vol. 9, no. 2, pp. 77-98, 2005.
[36] F.-G. Zeng and R. V. Shannon, "Psychophysical laws revealed by electric
hearing," Neuroreport, vol. 10, no. 9, pp. 1931-1935, 1999.
[37] J. H. Johnson, C. W. Turner, J. J. Zwislocki, and R. H. Margolis, "Just
noticeable differences for intensity and their relation to loudness," The
Journal of the Acoustical Society of America, vol. 93, no. 2, pp. 983-991,
1993.
[38] F.-G. Zeng and R. V. Shannon, "Loudness balance between electric and
acoustic stimulation," Hearing research, vol. 60, no. 2, pp. 231-235, 1992.
[39] Y.-H. Lai, Y. Tsao, and F. Chen, "Effects of adaptation rate and noise
suppression on the intelligibility of compressed-envelope based speech,"
PloS one, vol. 10, no. 7, p. e0133519, 2015.
[40] Y. Kodratoff, Introduction to machine learning. Morgan Kaufmann, 2014.
[41] E. Eyob, Social Implications of Data Mining and Information Privacy:
Interdisciplinary Frameworks and Solutions: Interdisciplinary
Frameworks and Solutions. IGI Global, 2009.
[42] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions
on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
[43] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no.
7553, pp. 436-444, 2015.
[44] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, "Supervised machine learning:
A review of classification techniques," ed, 2007.
[45] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine
learning. MIT press Cambridge, 2006.
[46] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural
networks," in Proceedings of the Fourteenth International Conference on
Artificial Intelligence and Statistics, 2011, pp. 315-323.
[47] J. Malik and P. Perona, "Preattentive texture discrimination with early
vision mechanisms," JOSA A, vol. 7, no. 5, pp. 923-932, 1990.
[48] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural
network model for a mechanism of visual pattern recognition," in
Competition and cooperation in neural nets: Springer, 1982, pp. 267-285.
[49] J. Schmidhuber, "Deep learning in neural networks: An overview," Neural
networks, vol. 61, pp. 85-117, 2015.
[50] A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural
networks for robust speech recognition," in Acoustics, Speech and Signal
Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp.
7092-7096: IEEE.
[51] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Ensemble modeling of denoising
autoencoder for speech spectrum restoration," in INTERSPEECH, 2014, vol.
14, pp. 885-889.
[52] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural
network," arXiv preprint arXiv:1503.02531, 2015.
[53] L. Muda, M. Begam, and I. Elamvazuthi, "Voice recognition algorithms
using mel frequency cepstral coefficient (MFCC) and dynamic time
warping (DTW) techniques," arXiv preprint arXiv:1003.4083, 2010.
[54] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013. [55] Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean
square error short-time spectral amplitude estimator," IEEE Transactions
on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109-1121,
1984.
[56] J. Du and Q. Huo, "A speech enhancement approach using piecewise linear
approximation of an explicit model of environmental distortions," in Ninth
Annual Conference of the International Speech Communication Association,
2008.
[57] J. Ma, Y. Hu, and P. C. Loizou, "Objective measures for predicting speech
intelligibility in noisy conditions based on new band-importance functions,"
The Journal of the Acoustical Society of America, vol. 125, no. 5, pp. 3387-
3405, 2009.
[58] H. Jiang, "Confidence measures for speech recognition: A survey," Speech
communication, vol. 45, no. 4, pp. 455-470, 2005.
[59] S. Ideas, "Sample CD: XV MP3 Series SI-XV-MP3. In.," ed, 2002.
[60] L. Ma, B. Milner, and D. Smith, "Acoustic environment classification,"
ACM Transactions on Speech and Language Processing (TSLP), vol. 3, no.
2, pp. 1-22, 2006.
[61] R. Y. Rubinstein, A. Ridder, and R. Vaisman, "Cross‐Entropy Method," Fast
Sequential Monte Carlo Methods for Counting and Optimization, pp. 6-36.
[62] D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv
preprint arXiv:1412.6980, 2014.
[63] W. R. Wilson, F. M. Byl, and N. Laird, "The efficacy of steroids in the
treatment of idiopathic sudden hearing loss: a double-blind clinical study,"
Archives of Otolaryngology, vol. 106, no. 12, pp. 772-776, 1980.
[64] L. K. Holden et al., "Factors affecting open-set word recognition in adults
with cochlear implants," Ear and hearing, vol. 34, no. 3, p. 342, 2013.
[65] 黃銘緯, "台灣地區噪音下漢語語音聽辨測試," 2005.
[66] S. Haykin, Advances in spectrum analysis and array processing (vol. III).
Prentice-Hall, Inc., 1995.
[67] R. V. Shannon, F.-G. Zeng, V. Kamath, J. Wygonski, and M. Ekelid,
"Speech recognition with primarily temporal cues," Science, vol. 270, no.
5234, p. 303, 1995.
[68] F.-G. Zeng et al., "Speech dynamic range and its effect on cochlear implant
performance," The Journal of the Acoustical Society of America, vol. 111,
no. 1, pp. 377-386, 2002.