簡易檢索 / 詳目顯示

研究生: 顏冠睿
Yan, Guan-Ruei
論文名稱: 電腦輔助會議口譯準備之成效初探—以政治類題目為例
An Exploratory Study on the Effects of Computer-Assisted Conference Preparation for Interpreters: A Case Study on Political Speeches
指導教授: 高照明
Gao, Zhao-Ming
口試委員: 高照明
Gao, Zhao-Ming
汝明麗
Ru, Ming-Li
陳正賢
Chen, Cheng-Hsien
陳安頎
Chen, An-Chi
口試日期: 2023/02/20
學位類別: 碩士
Master
系所名稱: 翻譯研究所
Graduate Institute of Translation and Interpretation
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 118
中文關鍵詞: 口譯會議準備關鍵詞表自動化專門語料庫搜集口譯科技電腦輔助口譯工具
英文關鍵詞: conference preparation, keyword list, keyphrase list, specialized corpus compilation, interpreting technology, CAI tools
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202300628
論文種類: 學術論文
相關次數: 點閱:132下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在翻譯研究的範疇內,翻譯科技的學術討論大多聚焦於筆譯領域;口譯科技的論文發表不免相形見絀。近年來,科技發展日新月異,可作為口譯輔助的語言科技也不斷進步。本研究旨在探討相關語言科技軟體(AntConc)與程式語言套件(spaCy)對於口譯準備的成效與影響。研究先透過Python語言套件(newspaper3k)找出某一演講介紹稿中的種子詞,再以半自動方式(BootCaT)上網搜尋並下載相關的文章或資料,最後根據該主題創建專門語料庫。從中藉由不同的統計學或語言學計算方式抽取出不同的詞表,表內關鍵詞或關鍵詞組之於演講逐字稿的覆蓋率越高,表示該詞表品質越好。本研究結果發現,該自建專門語料庫的詞覆蓋率將近七成;而各類關鍵詞表(TF、TF-IDF、LLR)雖沒有亮眼的覆蓋率(12%、37%、59%),但從AntConc獲得的詞表內容、N-gram、搭配詞(collocates)和前後文例句(concordance)卻可補足口譯準備過程中的不同需求:建立背景知識、找出重要英文搭配詞、協助預想關鍵詞的翻譯方式等。除此之外,透過關鍵詞與N-gram技術,較常出現的習語與專門領域的特定語言使用都可以直接呈現出來。本研究指出,相關數位工具或可有效降低會議準備時間,並提高口譯服務的品質。在未來,口譯準備工作的自動化整合新興科技,勢必有助於口譯科技的普及。

    With the rapid development of technology, digital language tools in interpreting are attracting scholarly attention. This study aims to unveil the potential of language tools, like AntConc, and Python libraries, like spaCy, on interpreters’ conference preparation. This research uses a Python library (Newspaper3k) to identify seed words from the introductory texts to the chosen speech. Seed words are used for online searches, and the related contents are semi-automatically downloaded and compiled into a specialized corpus by BootCaT. Keywords and keyphrases are extracted statistically or linguistically and compared against the target corpus built from the speech transcript for the coverage rate; the higher, the better. The results show that the overall coverage rate of the specialized corpus is saliently at 70%. Tokens in the keyword/ keyphrase lists, N-grams, collocates, and concordance results retrieved through AntConc seem helpful for interpreters on different fronts, including background knowledge building and prefabricated lexical bundle identification. Keywords with N-grams can also offer quicker access for interpreters to create interpreting glossaries more effectively and with better quality. This study has demonstrated that digital tools may enhance preparation efficiency and improve interpretation quality. In the future, the interpreting preparation automation workflow, if integrated with other new tools like machine translation, is expected to gain popularity and make new technologies more accessible to all.

    Acknowledgment i 摘要 ii Abstract iii Table of Contents iv List of Tables vii List of Figures viii Chapter 1 Introduction 1 1.1 Background of the Study 1 1.2 Research Motivation 4 1.3 Research Questions 7 1.4 Significance of the Study 7 1.5 Organization of the Study 8 1.6 Terminology 9 Chapter 2 Literature Review 12 2.1. Interpreting and Technology 12 2.2. Conference Preparation 13 2.3. Interpreting Glossaries 15 2.4. Computer-Assisted Interpreting Tools 16 2.5. Keyword Extraction 18 2.5.1 Keyword Extraction: Statistical Approaches 20 2.5.2 Keyword Extraction: Linguistic Approaches 22 2.5.3 Keyword Extraction: Machine-learning Approaches 23 2.6. Computational Resources 24 2.6.1 BootCaT 24 2.6.2 AntConc and TagAnt 25 2.6.3 TextBlob 26 2.6.4 spaCy 26 Chapter 3 Research Methods 28 3.1 Overview 28 3.2 Automatic Keyword Extraction Workflow 31 3.3 Keyword Lists Quality Assessment 36 Chapter 4 Results and Discussions 42 4.1 Conference Preparation Automation 42 4.2 Corpora Details 45 4.3 Quality Assessment with Different Sorting Criteria 48 4.3.1. The Quality of Keywords in the Specialized Corpus 49 4.3.2. The Quality of Keyphrases in the Specialized Corpus 68 4.3.3. The Quality of N-grams in the Specialized Corpus 72 4.4. Interpreting Glossaries with Keyword Lists and N-grams 75 Chapter 5 Conclusions 80 5.1 Summary of Findings 80 5.1.1. The Creation of Specialized Corpora for Interpreters 81 5.1.2. The Conference Preparation Automation 81 5.1.3. The Effects of Specialized Corpora for Interpreting 82 5.1.4. Interpreting Glossaries Creation Made Easy 83 5.2 Limitations of the Present Study 84 5.3 Suggestions for Further Studies 85 References 86 Appendices 93 Appendix I: Introduction Text to the Speech Prospects and challenges for Taiwan in the years ahead 93 Appendix II: The Transcript of the Speech Prospects and challenges for Taiwan in the years ahead 94 Appendix III: Top 100 TF List of Keywords (frequency > 5) 112 Appendix IV: Full List of Covered Keyphrases 117

    AIIC. (2016, October 11). Practical guide for professional conference interpreters. AIIC. https://aiic.org/site/world/about/profession/guidelines
    Akash. (2021, October 9). Making Natural Language Processing easy with TextBlob. Analytics Vidhya.
    https://www.analyticsvidhya.com/blog/2021/10/making-natural-language-processing-easy-with-textblob/
    Atkins, B. T. S., & Rundell, M. (2008). The Oxford guide to practical lexicography. Oxford University Press.
    Bai, M.-H., Wu, J.-C., Chien, Y.-N., Huang, S.-L., Lin, C.,-L. (2016). A Study on Dispersion Measures for Core Vocabulary Compilation. Computational Linguistics and Chinese Language Processing, 21(2), 1-18.
    Baker, P. (2006). Using Corpora in Discourse Analysis. Bloomsbury Academic.
    Baroni, M. & Bernardini, S. (2004). BootCaT: Bootstrapping corpora and terms from the web. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC04), Portugal, 1313-1316.
    Bassole, D. E. (2018). Diplomatic ambiguity in interpreter-mediated communication [Unpublished master’s thesis]. University of Pretoria.
    Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . . : lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371–405.
    Bowker, L. & Corpas Pastor, G. (2015). Translation technology. In R. Mitov (Ed.) Handbook of Computational Linguistics (pp. 871-905). Oxford University Press.
    Buendía, M. & López, C. I. (2013). The web for corpus and the web as corpus in translator training. New Voices in Translation Studies, 10, 54-71.
    Chan, P.-Y. (2015). The student interpreter’s glossary: A survey [ Unpublished master’s thesis]. National Taiwan Normal University.
    Chang, C.-C., Wu, M. M.-C., & Kuo, T.-C. G. (2018). Conference interpreting and knowledge acquisition: how professional interpreters tackle unfamiliar topics. Interpreting, 20(2), 204-231.
    Church, K. W. & Hanks, P. (1990). Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics, 16(1), 22–29.
    Corpas Pastor, G. & Fern, L. (2016). A survey of interpreters’ needs and practices related to language technology. Technical report. (Publication No. FFI2012-38881-MINECO/TI-DT-2016-1). University of Malaga.
    Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. Computation and Language, arXiv. doi: 10.48550/ARXIV.1810.04805.
    Donovan, C. (2001). Interpretation of technical conferences. Conference Interpretation and Translation, 3(1), 7-29.
    Fantinuoli, C. (2017). Computer-assisted preparation in conference interpreting. Translation & Interpreting, 9(2), 24-37. DOI: 10.12807/ti.109202.2017.a02
    Fantinuoli, C. (2018). Interpreting and technology: the upcoming technological turn. In C. Fantinuoli (Ed.). Interpreting and technology (pp. 1-12). Language Science Press. DOI:10.5281/zenodo.1493289
    Gile, D. (2002). The interpreter’s preparation for technical conferences: Methodological questions in investigating the topic. Conference Interpretation and Translation, 4(2), 7–27.
    Gile, D. (2009). Basic concepts and models for interpreter and translator training (Rev. ed.). John Benjamins. https://doi.org/10.1075/btl.8
    Honnibal, M., Montani, I., Van Landeghem, S. & Boyd, A. (2020). spaCy: Industrial-strength Natural Language Processing in Python. doi: 10.5281/zenodo.1212303. https://github.com/explosion/spaCy
    Hu, X. & Wu, B. (2006). Automatic Keyword Extraction Using Linguistic Features. Sixth IEEE International Conference on Data Mining - Workshops (ICDMW06), 19-23. doi: 10.1109/ICDMW.2006.36.
    Jiang, H. (2013). The interpreter’s glossary in simultaneous interpreting: A survey. Interpreting, 15(1), 74–93. https://doi.org/10.1075/intp.15.1.04jia
    Karakoç, N. Y. (2016). Non-cognitive causes of imprecision in consecutive interpreting in diplomatic settings in light of functionalism. Procedia - Social and Behavioral Sciences, 231, 154-158.
    Kaur, J. & Gupta, V. (2010). Effective approaches for extraction of keywords. International Journal of Computer Science Issues, 7(6), 144-148.
    Kilgarriff, A. & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29, 333–347.
    Kunanets, N., Levchenko, O. & Hadzalo, A. (2018). The Application of AntConc Concordanger in Linguistic Researches. Proceedings of 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), 144-147, doi: 10.1109/STC-CSIT.2018.8526591.
    Kurz, I. (2001). Conference interpreting: Quality in the ears of the user. Meta, 46(2), 394–409. https://doi.org/10.7202/003364ar
    Luccarelli, L. (2006). Conference preparations: What it is and how it could be taught. Conference Interpretation and Translation, 8(1), 3–26.
    Malik, U. (2019 April 16). Python for NLP: Introduction to the TextBlob Library. Stack Abuse. https://stackabuse.com/python-for-nlp-introduction-to-the-textblob-library/
    Malik, U. (2019 March 21). Python for NLP: Vocabulary and Phrase Matching with SpaCy. Stack Abuse. https://stackabuse.com/python-for-nlp-vocabulary-and-phrase-matching-with-spacy/
    McEnery, T., Xiao, Z., & Tono, Y. (2006). Corpus based language studies: An advanced resource book. Routledge.
    https://doi.org/10.1017/s0047404508080615
    Miller, D. (2020). Analysing Frequency Lists. In: Paquot, M. & Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics (pp. 77-97). Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_4
    Nikitina, A. (2012). Successful public speaking. bookboon.com.
    Ortiz, L. E. S., Cavallo, P. (2018). Computer-assisted interpreting tools (CAI) and options for automation with automatic speech recognition. Tradterm, 32, 9-31. https://doi.org/10.11606/issn.2317-9511.v32i0p9-31
    Pan, L., Yao, H., Li, Z. & Ren, Y. (2022). A code error correction system for PDF documents using regex and similarity matching. 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), 2436-2440. doi: 10.1109/ITAIC54216.2022.9836781.
    Pöchhacker, F. (2016). Introducing interpreting studies. Routledge.
    Potts, A., & Baker, P. (2012). Does semantic tagging identify cultural change in British and American English? International Journal of Corpus Linguistics, 17(3), 295-324.
    Prandi, B. (2015). The use of CAI tools in interpreters’ training: a pilot study. Proceedings of the Translating and the Computer 37 Conference, London.
    Prandi, B. (2017). Designing a multimethod study on the use of CAI tools during simultaneous interpreting. Proceedings of the 39th Conference Translating and the Computer, UK, 76-88. AsLing.
    Rayson, P. & Garside, R. (2000). Comparing Corpora using Frequency Profiling. Proceedings of the workshop on Comparing Corpora, Hong Kong, 1-6.
    Rayson, P., Berridge D. & Francis B. (2004). Extending the Cochran rule for the comparison of word frequencies between corpora. Proceedings of the 7th International Conference on Statistical analysis of textual data, Belgium, 2, 92-936.
    Rodríguez, N., & Schnell, B. (2009). A look at terminology adapted to the requirements of interpretation, Language Update, 6(1), 21-27.
    Roland, R. A. (1999). Interpreters as Diplomats: A Diplomatic History of the Role of Interpreters in World Politics. University of Ottawa Press. http://www.jstor.org/stable/j.ctt1cn6sx2
    Rütten, A. (2003). Computer-based information management for conference interpreters or how will I make my computer act like an infallible information butler? Translating and the Computer, 25. Aslib.
    Sali, Y. & Erden, M. (2022) Automatic keyword extraction from dialogue text. Signal Processing and Communications Applications Conference (SIU), 2022, 1-4. doi: 10.1109/SIU55565.2022.9864851.
    Sandrelli, A. & Jerez, J. (2007). The impact of information and communication technology on interpreter training state-of-the-art and future prospects. The Interpreter and Translator Trainer, 1(2), 269-303.
    https://doi.org/10.1080/1750399X.2007.10798761
    Santorini, B. (1990). Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision). https://repository.upenn.edu/cis_reports/570
    Scott, M., & Tribble, C. (2006). Textual patterns: Keyword and corpus analysis in language education. John Benjamins.
    Siddiqi, S., & Sharan, A. (2015). Keyword and keyphrase extraction techniques: a literature review. International Journal of Computer Applications, 109(2), 18-23.
    Smyth, C. (2016). An introduction to corpus linguistics. Bulletin of Tokyo Denki University, Arts and Sciences, 14, 105-110.
    Varantola, K. (2003). Translators and disposable corpora. In: Zanettin, F., Bernardini, S. & Stewart, D.(eds), Corpora in Translator Education, (pp. 55–70), Routledge.
    Wan, H. & Yuan, X. (2022). Perceptions of computer-assisted interpreting tools in interpreter education in Chinese mainland: preliminary findings of a survey. International Journal of Chinese and English Translation and Interpreting, 1, 1-28.
    Wang, X. & Wang, C. (2019). Can computer-assisted interpreting tools assist interpreting? Transletters. International Journal of Translation and Interpreting, 3. 109-139.
    Xu, R. (2018). Corpus-based terminological preparation for simultaneous interpreting. Interpreting, 20(1), 33-62.
    Zanettin, F. (2002). Corpora in translation practice. Proceedings of the Workshop Language Resources for Translation Work and Research, 10-14.
    Zarefsky, D. (2013). Public Speaking: Strategies for Success. Pearson.
    汝明麗、吳佳樺(2019)。自動術語抽取技術於口譯詞彙表之初探應用研究。《翻譯學研究集刊》,23,93-114。[Ju, M.-L. & Wu, C.-H. (2019). The exploratory application of automatic term extraction method to interpreting glossary preparation. Studies of Translation and Interpretation, 23, 93-114.]
    陳瑞清(2011)。語料庫在口筆譯教學與研究上的應用。《翻譯學研究集刊》,14,115-134。[Chen, J.-C. (2011). A corpus-assisted approach to teaching translation and interpretation. Studies of Translation and Interpretation, 14, 115-134.] http://dx.doi.org/10.29786/STI.201112.0006
    詹柏勻、汝明麗(2022)。口譯學生的詞彙表調查-專技觀點。《翻譯學研究集刊》,25,133-176。[Chan, P.-Y. & Ju, M.-L. (2022). The student interpreter’s glossary: An expert-novice perspective. Studies of Translation and Interpretation, 25, 133-176.]

    下載圖示
    QR CODE