簡易檢索 / 詳目顯示

研究生: 李俊廷
Li, Jiun-Ting
論文名稱: 探討提升自動英語口語評估準確性之方法- 以會話測試為例
Exploring Methods to Enhance Accuracy in Automated Speaking Assessment- English Interview as a Case Study
指導教授: 陳柏琳
Chen, Berlin
口試委員: 陳柏琳
Chen, Berlin
陳冠宇
Chen, Kuan-Yu
曾厚強
Tseng, Ho-Chiang
口試日期: 2024/01/24
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 64
英文關鍵詞: Automated Speaking Assessment, Bidirectional Encoder Representations from Transformers, Graph Neural Network, Spoken Response Coherence
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202400467
論文種類: 學術論文
相關次數: 點閱:115下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於全球化與網路的普及,人們需要學習第二語言的需求急劇增加,尤其是英文作為最主要的知識傳遞語言。雖然現今有許多免費或付費的英文教學影片、補習班等資源可供選擇,然而語言教師的增加速度卻跟不上學習者的需求。因此,為了解決此問題,我們需要有效率的方式處理學習者在語言學習過程中獲得的資訊,協助非母語者在沒有足夠語言教師的情況下,仍能順利地學習第二語言。在各種補足人力的方法中,電腦作為人力輔助的角色最為適合,尤其是語音辨識技術已經成熟,並出現許多商業應用案例,如電腦輔助語言學習 (Computer Assisted Language Learning, CALL) 的錯誤發音偵測與診斷 (Mispronunciation Detection and Diagnosis, MDD)、可讀性評量,以及我們本研究的主題:自動口說評量。自動口說評量是英文評量中的一個方面,透過受訪者的口說聲音和內容來進行能力評估,但需要英文專家花費時間進行評分。如果可以藉由電腦完成相同任務,將節省大量的人力、時間和金錢。然而,目前在此領域的研究遇到幾個問題,例如不同等級的語者數量不平衡,尤其是在最高和最低等級的語者數量和其他等級之間呈倍數差距,以及自由口說容需要考慮更細緻的子句關係代名詞關係和面試官的資訊。我們嘗試從資料、訓練技巧和模型架構等方面入手,提升整體效能,同時兼顧可解釋性,使本研究能夠真正在實際應用中被接受。模型的程式碼在 \url{https://github.com/a2d8a4v/HierarchicalContextASA/}、資料前處理的程式碼在 \url{https://github.com/a2d8a4v/local_for_nict_jle}。

    Due to globalization and the prevalence of the Internet, there has been a sharp increase in the demand for second language learning, especially in English, which is the primary language of knowledge transfer. While there are many free or paid resources such as English tutorial videos and cram schools available today, the rate of increase in language teachers cannot keep up with the demand of learners. Therefore, to address this problem, we need an efficient way to process the information acquired by learners in the language learning process, to assist non-native speakers in successfully learning a second language without sufficient language teachers. Among various methods to supplement manpower, the computer plays the most suitable role as a human assistant, especially since speech recognition technology has matured and many commercial applications have emerged, such as Mispronunciation Detection and Diagnosis (MDD) in Computer Assisted Language Learning (CALL), readability assessment, and the topic of our research: automatic speaking assessment. Automatic speaking assessment is an aspect of English assessment that evaluates the ability of respondents through their oral speech and content, but requires English experts to spend time grading. If the same task can be completed by a computer, it will save a lot of manpower, time, and money. However, current research in this field has encountered several problems, such as the imbalance of the number of speakers in different levels, especially the multiple differences in the number of speakers between the highest and lowest levels and other levels, and the need to consider more detailed clause relationships, pronoun relationships, and interviewer information in free speaking content. We attempted to improve the overall performance of our research from the aspects of data, training techniques, and model architecture, while also considering interpretability so that our research can be truly accepted in practical applications. The model's implementation code is available at \url{https://github.com/a2d8a4v/HierarchicalContextASA/}, and the code for the data preprocessing stage is at \url{https://github.com/a2d8a4v/local_for_nict_jle}.

    Acknowledgements i 摘要 viii Abstract ix Contents xi Chapter 1 Introduction 1 1.1 Research Motivation 1 1.2 Mission Description 2 1.2.1 Introduction and Problem Statement 2 1.2.2 Language Assessment 3 1.2.3 Technologies in Automated Speaking Assessment 4 1.2.4 Methodology Overview 5 1.2.4.1 Coherence Modeling 7 1.2.4.2 Word CEFR Ranking Integration and Disfluencies 8 1.3 Contributions 9 1.4 Structure of the Thesis 10 Chapter 2 Related Work 11 2.1 Automated Speaking Assessment (ASA) 11 2.2 Foundation Model 12 2.3 Heterogeneous Graph-based Learning 14 2.4 Structural Dialogue 16 Chapter 3 Methodology 18 3.1 Task Formulation 18 3.2 Overall Framework 18 3.3 Encoders 19 3.3.1 Contextualized Encoder 19 3.3.2 Enhanced Hierarchical Graph Encoders 19 3.3.3 Structured Graph Construction 22 3.4 Regressor 28 3.5 Optimization 29 Chapter 4 Experimental Settings and Results 31 4.1 Overview 31 4.2 The NICT JLE corpus 31 4.3 The EFCAMDAT Corpus 34 4.4 Implementational Details 35 4.5 Experimental Setup 36 4.6 Data Preprocessing 39 4.7 Main Results 43 4.8 Ablation Studies 44 Chapter 5 Conclusions and Future Works 49 5.1 Conclusions 49 5.1 Future Works 49 References 51

    [1] R. Al-Ghezi, Y. Getman, E. Voskoboinik, M. Singh, and M. Kurimo. Automatic rating of spontaneous speech for low-resource languages. In Proceedings of IEEE the Spoken Language Technology Workshop (SLT), pages 339–345, 2023.
    [2] S. M. Armenti. Computer Science Education with English Learners. University of Rhode Island, 2018.
    [3] J. L. Austin. How to Do Things with Words. Oxford: University Press, 1962.
    [4] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli. Wav2Vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems (NIPS), 33:12449–12460, 2020.
    [5] S. Bannò and M. Matassoni. Cross-corpora experiments of automatic proficiency assessment and error detection for spoken english. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications (BEA), pages 82– 91, 2022.
    [6] S. Bannò and M. Matassoni. Proficiency assessment of l2 spoken english using wav2vec 2.0. In 2022 IEEE Spoken Language Technology Workshop (SLT), pages 1088–1095. IEEE, 2023.
    [7] S. Bannò, K. M. Knill, M. Matassoni, V. Raina, and M. Gales. Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning. In Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), pages 126–130, 2023.
    [8] I. Beltagy, M. E. Peters, and A. Cohan. Longformer: The long-document Transformer. arXiv preprint arXiv:2004.05150, 2020.
    [9] J. Bérešová. The impact of the CEFR on teaching and testing English in the local context. Theory and practice in language studies, 7(11):959–964, 2017.
    [10] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
    [11] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
    [12] D. Busbridge, D. Sherburn, P. Cavallo, and N. Y. Hammerla. Relational graph attention networks. 2019.
    [13] M. Canale and M. Swain. Theoretical bases of com-municative approaches to second language teaching and testing. Applied linguistics, 1(1):1–47, 1980.
    [14] A. Cervone, E. Stepanov, and G. Riccardi. Coherence models for dialogue. In Proceedings Interspeech, pages 1011–1015, 2018.
    [15] F.-A. Chao, T.-H. Lo, T.-I. Wu, Y.-T. Sung, and B. Chen. 3M: An effective multiview, multi-granularity, and multi-aspect modeling approach to English pronunciation assessment. In 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 575–582. IEEE, 2022.
    [16] J. Chen and D. Yang. Structure-aware abstractive conversation summarization via discourse and action graphs. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2021.
    [17] L. Chen, K. Zechner, S.-Y. Yoon, K. Evanini, X. Wang, A. Loukina, J. Tao, L. Davis, C. M. Lee, M. Ma, et al. Automated scoring of nonnative speech using the speechrater sm v. 5.0 engine. ETS Research Report Series, 2018(1):1–31, 2018.
    [18] T.-C. Chi and A. Rudnicky. Structured Dialogue Discourse Parsing. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDD), pages 325–335, 2022.
    [19] S.-H. Chiu, T.-H. Lo, F.-A. Chao, and B. Chen. Cross-utterance reranking models with BERT and graph convolutional networks for conversational speech recognition. In Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1104–1110, 2021.
    [20] C. J. Cho, P. Wu, A. Mohamed, and G. K. Anumanchipalli. Evidence of vocal tract articulation in self-supervised learning of speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
    [21] S. P. Corder. The significance of learner’s errors. 1967.
    [22] H. Craighead, A. Caines, P. Buttery, and H. Yannakoudakis. Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 2258–2269, 2020.
    [23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186, 2019.
    [24] S. Ding, G. Zhao, and R. Gutierrez-Osuna. Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning. Computer Speech & Language, 72:101302, 2022.
    [25] Y. Dong, Z. Hu, K. Wang, Y. Sun, and J. Tang. Heterogeneous network representation learning. In C. Bessiere, editor, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 4861–4867. International Joint Conferences on Artificial Intelligence Organization, 2020.
    [26] H. Du, Y. Feng, C. Li, Y. Li, Y. Lan, and D. Zhao. Structure-discourse hierarchical graph for conditional question answering on long documents. In Proceedings of the Association for Computational Linguistics (ACL, Findings), pages 6282–6293, 2023.
    [27] A. Farajidizaji, V. Raina, and M. Gales. Is it possible to modify text to a target readability level? an initial investigation using zero-shot large language models, 2023.
    [28] X. Feng, X. Feng, B. Qin, and X. Geng. Dialogue discourse-aware graph model and data augmentation for meeting summarization. In Z.-H. Zhou, editor, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 3808–3814, 2021.
    [29] B. Flanagan, S. Hirokawa, E. Kaneko, E. Izumi, and H. Ogata. A multi-model SVR approach to estimating the CEFR proficiency level of grammar item features. In Proceedings of International Congress on Advanced Applied Informatics (IIAI-AAI), pages 521–526, 2017.
    [30] C. Fu, Z. Chen, J. Shi, B. Wu, C. Liu, C. T. Ishi, and H. Ishiguro. HAG: Hierarchical attention with graph network for dialogue act classification in conversation. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
    [31] J. Fu, Y. Chiba, T. Nose, and A. Ito. Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models. Speech Communication, 116:86–97, 2020.
    [32] X. Fu, J. Zhang, Z. Meng, and I. King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of The Web Conference 2020, pages 2331–2341, 2020.
    [33] Y. Fujinuma and M. Hagiwara. Semi-supervised joint estimation of word and document readability. In Proceedings of the Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs), pages 150–155, 2021.
    [34] J. Geertzen, T. Alexopoulou, A. Korhonen, et al. Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge open language database (EFCAMDAT). In Proc. of the Second Language Research Forum. Somerville, MA: Cascadilla Proceedings Project, pages 240–254. Citeseer, 2013.
    [35] L. Gilanyi, X. A. Gao, and S. Wang. Emi and clil in asian schools: A scoping review of empirical research between 2015 and 2022. Heliyon, 9(6):e16365, 2023.
    [36] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
    [37] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
    [38] N. Hernandez, N. Oulbaz, and T. Faine. Open corpora and toolkit for assessing text readability in French. In Proceedings of the Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), pages 54–61. European Language Resources Association, 2022.
    [39] D. Higgins, X. Xi, K. Zechner, and D. Williamson. A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech Language, 25(2):282–306, 2011.
    [40] P. Howson. The English effect. British Council, London, 2013.
    [41] W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460, 2021.
    [42] Y.-P. Huang. 以英語授課:一個探討臺灣的大學教師教學情況之質性個案研究. 外國語文研究, (20):27–62, 06 2014.
    [43] S. Ishikawa. Design of the ICNALE-spoken: A new database for multi-modal contrastive interlanguage analysis. Learner corpus studies in Asia and the world, 2:63–76, 2014.
    [44] E. Islam, T. Hain, and P. Nomo Sudro. Simulation of teacher-learner interaction in English language pronunciation learning. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2023.
    [45] E. Izumi, K. Uchimoto, and H. Isahara. The NICT JLE corpus. 12:7, 2004.
    [46] E. Izumi, K. Uchimoto, and H. Isahara. Error annotation for corpus of japanese learner english. In Proceedings of the International Workshop on Linguistically Interpreted Corpora (LINC), 2005.
    [47] P. Jamshid Lou and M. Johnson. Improving disfluency detection by self-training a self-attentive model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 3754–3763, 2020.
    [48] P. Jamshid Lou, Y. Wang, and M. Johnson. Neural constituency parsing of speech transcripts. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 2756–2765, 2019.
    [49] A. K. Joshi and S. Kuhn. Centered logic: The role of entity centered sentence representation in natural language inferencing. In Proceedings of the international joint conference on Artificial intelligence, pages 435–439, 1979.
    [50] U. Khandelwal, H. He, P. Qi, and D. Jurafsky. Sharp nearby, fuzzy far away: How neural language models use context. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 284–294, 2018.
    [51] E. Kim, J.-J. Jeon, H. Seo, and H. Kim. Automatic pronunciation assessment using self-supervised speech representation learning. arXiv preprint arXiv:2204.03863, 2022.
    [52] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [53] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
    [54] R. Lado. Language testing: The construction and use of foreign language tests. a teacher’s book. 1961.
    [55] Y. Lei and M. Allen. English language learners in computer science education: A scoping review. In Proceedings of the ACM Technical Symposium on Computer Science Education, pages 57–63, 2022.
    [56] J.-T. Li, T.-H. Lo, B.-C. Yan, Y.-C. Hsu, and B. Chen. Graph-enhanced Transformer architecture with novel use of CEFR vocabulary profile and filled pauses in automated speaking assessment. In Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), pages 109–113, 2023.
    [57] N. Li and J. Wu. Exploring assessment for learning practices in the emi classroom in the context of Taiwanese higher education. Language Education Assessment, 1:28–44, 2018.
    [58] R. J. Lickley. Detecting disfluency in spontaneous speech. PhD thesis, University of Edinburgh, 1994.
    [59] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
    [60] Y.-C. Lo, J.-J. Chen, C. Yang, and J. Chang. Cool English: a grammatical error correction system based on large learner corpora. In Proceedings of the International Conference on Computational Linguistics (ICCL), pages 82–85. Association for Computational Linguistics, 2018.
    [61] A. Loukina and A. Cahill. Automated scoring across different modalities. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications (BEA), pages 130–135, 2016.
    [62] R. Ma, M. Qian, M. Gales, and K. M. Knill. Adapting an ASR Foundation Model for Spoken Language Assessment. In Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), pages 104–108, 2023.
    [63] W. C. Mann and S. A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text-interdisciplinary Journal for the Study of Discourse, 8(3):243–281, 1988.
    [64] D. Marcu. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press, 11 2000.
    [65] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.
    [66] C. G. McGhee, K. M. Knill, and M. Gales. Towards Acoustic-to-Articulatory Inversion for Pronunciation Training. In Proceedings of the Workshop on Speech and Language Technology in Education (SLaTE), pages 66–70, 2023.
    [67] S. W. McKnight, A. Civelekoglu, M. Gales, S. Bannò, A. Liusie, and K. M. Knill. Automatic assessment of conversational speaking tests. In Proceedings the Workshop on Speech and Language Technology in Education (SLaTE), pages 99–103, 2023.
    [68] E. W. Myers. An o (nd) difference algorithm and its variations. Algorithmica, 1(1-4):251–266, 1986.
    [69] M. Negishi, T. Takada, and Y. Tono. A progress report on the development of the cefr-j. In In Exploring language frameworks: Proceedings of the ALTE Kraków Conference, 2013.
    [70] S. M. Ngangbam. Taiwan's bilingual nation policy 2030: Concerned issues and suggestions / 2030 年國家雙語政策之重要問題與建議. European Journal of Literature, Language and Linguistics Studies, 6(2), 2022.
    [71] C. of Europe. Common European Framework of Reference for Languages: Learning Teaching, Assessment. Cambridge University Press, Cambridge, UK, 2001.
    [72] C. M. Ormerod, A. Malhotra, and A. Jafari. Automated essay scoring using efficient Transformer-based language models. arXiv preprint arXiv:2102.13136, 2021.
    [73] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024.
    [74] J. Park and S. Choi. Addressing cold start problem for end-to-end automatic speech scoring. In Proceedings of Interspeech, pages 994–998, 2023.
    [75] J. Pennington, R. Socher, and C. Manning. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.
    [76] T.-A. Phan, N.-D. N. Nguyen, and K.-H. N. Bui. HeterGraphLongSum: Heterogeneous graph neural network with passage aggregation for extractive long document summarization. In Proceedings of the International Conference on Computational Linguistics (COLING), pages 6248–6258, 2022.
    [77] J. B. Pride and J. Holmes. Sociolinguistics: selected readings. 1972.
    [78] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning. Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL), 2020.
    [79] Y. Qian, P. Lange, K. Evanini, R. Pugh, R. Ubale, M. Mulholland, and X. Wang. Neural approaches to automated speech scoring of monologue and dialogue responses. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8112–8116, 2019.
    [80] Y. Qian, R. Ubale, M. Mulholland, K. Evanini, and X. Wang. A prompt-aware neural network approach to content-based scoring of non-native spontaneous speech. In Proceedings of IEEE Spoken Language Technology Workshop (SLT), pages 979– 986, 2018.
    [81] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. 2018.
    [82] D. Ramesh and S. K. Sanampudi. An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55(3):2495–2527, 2022.
    [83] R. Ridley, L. He, X.-y. Dai, S. Huang, and J. Chen. Automated cross-prompt scoring of essay traits. In Proceedings of the Conference on Association for the Advancement of Artificial Intelligence (AAAI), volume 35, pages 13745–13753, 2021.
    [84] P. M. Rogerson-Revell. Computer-assisted pronunciation training (CAPT): Current issues and future directions. RELC Journal, 52(1):189–205, 2021.
    [85] M. Saeki, Y. Matsuyama, S. Kobashikawa, T. Ogawa, and T. Kobayashi. Analysis of multimodal features for speaking proficiency scoring in an interview dialogue. In Proceedings of the Workshop Spoken Language Technology Workshop (SLT), pages 629–635, 2021.
    [86] K. Sakaguchi, M. Heilman, and N. Madnani. Effective feature integration for automated short answer scoring. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 1049–1054, 2015.
    [87] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
    [88] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In Proceedings of The International Conference of The Semantic Web (ESWC), pages 593–607. Springer, 2018.
    [89] V. J. Schmalz and A. Brutti. Automatic assessment of english cefr levels using bert embeddings. In Proceedings of the Eighth Italian Conference on Computational Linguistics, 2021.
    [90] I. Shatz. Refining and modifying the EFCAMDAT: Lessons from creating a new corpus from an existing large-scale English learner language database. International Journal of Learner Corpus Research, 6(2):220–236, 2020.
    [91] Z. Shi and M. Huang. A deep sequential model for discourse parsing on multi-party dialogues. In Proceedings of the Conference on Association for the Advancement of Artificial Intelligence (AAAI), 2019.
    [92] S. Shimauchi. English-medium instruction in the internationalization of higher education in japan: Rationales and issues. Educational Studies in Japan, 12:77–90, 10 2018.
    [93] L. Skidmore and R. Moore. Incremental disfluency detection for spoken learner English. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications (BEA), pages 272–278, 2022.
    [94] G. Stanovsky, J. Michael, L. Zettlemoyer, and I. Dagan. Supervised open information extraction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
    [95] M. Straka and J. Straková. Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL) Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 88–99, 2017.
    [96] W. Sun and X. L. Rong. English education reform in Asian countries, 05 2021.
    [97] E. Szügyi, S. Etler, A. Beaton, and M. Stede. Automated assessment of language proficiency on German data. In KONVENS, 2019.
    [98] A. Tack, T. François, P. Desmet, and C. Fairon. NT2Lex: A CEFR-graded lexical resource for Dutch as a foreign language linked to open Dutch WordNet. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications (BEA), pages 137–146, 2018.
    [99] M. Tanimura, K. Takeuchi, and H. Isahara. From learners’ corpora to expert knowledge description: Analyzing prepositions in the NICT JLE (Japanese learner English) corpus. In Proceedings of the IWLeL: an interactive workshop on language e-learning, pages 139–147. Waseda University, 2005.
    [100] A. Van Moere and R. Downey. Technology and artificial intelligence in language assessment. Handbook of second language assessment, pages 342–357, 2016.
    [101] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In International Conference on Learning Representations (ICLR), 2018.
    [102] D. Wang, P. Liu, Y. Zheng, X. Qiu, and X. Huang. Heterogeneous graph neural networks for extractive document summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 6209–6219, 2020.
    [103] M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, et al. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
    [104] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In Proceedings of The World Wide Web conference (WWW), pages 2022–2032, 2019.
    [105] Y. Wang, C. Wang, R. Li, and H. Lin. On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 3416–3425, 2022.
    [106] D. G. Williams. South Korean higher education English-medium instruction (emi) policy: From `resentment'to `remedy’. English Today, page 1–6, 2023.
    [107] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush. Transformers: state-of-the-art natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 38–45, 2020.
    [108] T. Wu, X. Bai, W. Guo, W. Liu, S. Li, and Y. Yang. Modeling fine-grained information via knowledge-aware hierarchical graph for zero-shot entity retrieval. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), page 1021–1029, 2023.
    [109] T.-I. Wu, T.-H. Lo, F.-A. Chao, Y.-T. Sung, and B. Chen. Effective neural modeling leveraging readability features for automated essay scoring. In Proceedings of The Workshop on Speech and Language Technology in Education (SLaTE), pages 81– 85, 2023.
    [110] X. Wu, K. M. Knill, M. J. Gales, and A. Malinin. Ensemble approaches for uncertainty in spoken language assessment. In Proceedings Interspeech 2020, pages 3860–3864, 2020.
    [111] J. Xie, K. Cai, L. Kong, J. Zhou, and W. Qu. Automated essay scoring via pairwise contrastive regression. In Proceedings of the International Conference on Computational Linguistics (ICCL), pages 2724–2733, 2022.
    [112] R. Xu, W. Pan, C. Chen, X. Chen, S. Lin, and X. Li. Graph-based model using text simplification for readability assessment. In Proceedings of The International Conference on Asian Language Processing (IALP), pages 401–406, 2022.
    [113] B.-C. Yan, H.-W. Wang, Y.-C. Wang, and B. Chen. Effective graph-based modeling of articulation traits for mispronunciation detection and diagnosis. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
    [114] C. Yang, Y. Xiao, Y. Zhang, Y. Sun, and J. Han. Heterogeneous network representation learning: A unified framework with survey and benchmark. TKDE, 2020.
    [115] R. Yang, J. Cao, Z. Wen, Y. Wu, and X. He. Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1560–1569, 2020.
    [116] R. Yang, J. Cao, Z. Wen, Y. Wu, and X. He. Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP, Findings), Nov. 2020.
    [117] H. Yannakoudakis, Ø. E. Andersen, A. Geranpayeh, T. Briscoe, and D. Nicholls. Developing an automated writing placement system for ESL learners. Applied Measurement in Education, 31(3):251–267, 2018.
    [118] K. Yasuda, K. Kitamura, S. Yamamoto, and M. Yanagida. Development and applications of an English learner corpus with multiple information tags. Journal of Natural Language Processing, 16(4):447–463, 2009.
    [119] K. Zechner, D. Higgins, and X. Xi. SpeechraterTM: A construct-driven approach to scoring spontaneous non-native speech. In Proceedings of The Speech and Language Technology in Education (SLaTE), pages 128–131, 2007.
    [120] K. Zechner, D. Higgins, X. Xi, and D. M. Williamson. Automatic scoring of nonnative spontaneous speech in tests of spoken English. Speech Communication, 51(10):883–895, 2009.
    [121] J. Zeng, Y. Xie, X. Yu, J. S. Lee, and D.-X. Zhou. Enhancing automatic readability assessment with pre-training and soft labels for ordinal regression. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP, Findings), pages 4557–4568, 2022.
    [122] H. Zhang, X. Liu, and J. Zhang. Contrastive hierarchical discourse graph for scientific document summarization. In Proceedings of the Workshop on Computational Approaches to Discourse (CODI), pages 37–47, 2023.

    下載圖示
    QR CODE