簡易檢索 / 詳目顯示

研究生: 鄭諺祺
Yen-Chi Cheng
論文名稱: 漢語音譯用字傾向的語料庫研究:以臺灣與中國大陸新聞為例
A Corpus-based Analysis of Character Usage in Chinese Transliteration: A Case Study of Newspapers in Taiwan and Mainland China
指導教授: 高照明
Gao, Zhao-Ming
學位類別: 碩士
系所名稱: 翻譯研究所
Graduate Institute of Translation and Interpretation
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 123
中文關鍵詞: 音譯用字選字語料庫對數概似比檢定
英文關鍵詞: transliteration, character usage, character choosing, corpus, log-likelihood-ratio test
論文種類: 學術論文
相關次數: 點閱:313下載:19
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 漢語音譯詞與其他語言最主要的音譯不同之處,在於決定以何種譯音對應來源語發音之後,仍必須從同音漢字中選擇一個作為產出的音譯字。過去音譯研究雖不乏關於選字原則的討論,但多屬以音譯辭典等工具書為依據的質性分析,鮮有收集第一手語料的量化分析。本研究試圖利用語料庫與統計方法,找出現代漢語中有生產力的音譯字,描述現代漢語的音譯用字規範。研究者利用程式,從四個臺灣及中國大陸的新聞網站收集篇章,建置新聞語料庫,然後從中擷取帶有括號夾註原文的音譯詞,根據詞彙指涉對象的性質,加上「人名」(在可判別的情況下並標註性別)、「地名」、「其他實體名」等標記,製成四個音譯詞子語料庫,觀察子語料庫中的音譯字對應於新聞語料庫所有漢字的分布,並利用對數概似比檢定(log-likelihood-ratio test),比較各種不同條件下的音譯用字差異。研究結果揭示了音譯字當中有約80%共通出現於各種音譯詞,約20%明顯傾向使用於特定條件,顯示出漢語音譯用字規範內部的不同質。

    The primary difference between transliteration in Chinese and that in other languages is the necessity of choosing one among many homophonous characters of the pronunciation that is chosen to represent the source language sound. Most previous transliteration studies that discuss the principles of the character choosing process were qualitative, using reference books such as transliteration dictionaries as sources, while few were primary-data-driven quantitative analyses. This study attempts to find the productive characters in contemporary Chinese transliteration and describe the norms of contemporary Chinese transliteration from a corpus-based, statistical approach. The researcher compiles four news corpora from Taiwan and Mainland China news websites. Four transliteration sub-corpora are then compiled by extracting from these news corpora transliterations with their corresponding source language words in parentheses and annotating them as “person” (with gender tags when possible), “place” or “other entity” according to the nature of their referents. The researcher observes the distribution of the characters in the transliteration sub-corpora vis-a-vis the news corpora as well as the difference in character usage under various conditions using log-likelihood ratio tests. The result shows that roughly 80% of the characters used in transliteration are common to all categories of transliterations, while the rest 20% tend significantly to be used under certain conditions, a sign of the non-homogeneity within the norm of character usage in Chinese transliteration.

    第一章 緒論 1 第一節 漢語音譯的界定 1 第二節 研究目的 3 第二章 文獻回顧 6 第一節 關於漢語音譯的研究 6 第二節 語料庫比較分析的過去研究 10 第三章 研究方法 19 第一節 資料收集 19 第二節 資料編輯 22 第三節 資料標記 25 第四章 音譯字語料的整體分布 28 第一節 新聞語料庫與音譯詞子語料庫的單字集中趨勢 28 第二節 音譯字在新聞語料庫所有漢字中的分布 35 第三節 音譯語料庫中的多字組 40 第五章 音譯字的分組對比 44 第一節 音譯詞中不同位置的音譯字分布趨勢 44 第二節 臺灣與中國大陸整體音譯用字趨勢 51 第三節 人名的音譯策略 54 第四節 地名的音譯策略 64 第五節 其他實體名的音譯策略 66 第六節 音譯字在詞彙類型中分布傾向的綜合呈現 71 第六章 結論 75 參考文獻 79 附錄 84 附錄一 音譯字在新聞語料所有漢字中的分布 84 附錄二 同音音譯字在詞彙各位置出現次數LLR檢定 104 附錄三 音譯字次數LLR批次檢定 (臺灣整體—中國大陸整體) 107 附錄四 音譯字次數LLR批次檢定 (全語料男性名—全語料女性名) 110 附錄五 音譯字次數LLR批次檢定 (臺灣人名—中國大陸人名) 112 附錄六 音譯字次數LLR批次檢定 (全語料人名—全語料地名) 115 附錄七 音譯字次數LLR批次檢定 (全語料人名—全語料其他實體名) 118 附錄八 音譯字次數LLR批次檢定 (全語料地名—全語料其他實體名) 122


    鍾建閎(1955)。譯者序。羅素(Bertrand Russell)。西方哲學史。(pp. 1-20)。臺北市:中華文化出版事業委員會。


    Baker, M. (1995). Corpora in translation studies: an overview and some suggestions for future research. Target, 7, 223-243.
    Church, K., & P. Hanks. (1989). Word association norms, mutual information and lexicography. ACL Proceedings, 27th Annual Meeting, 76-83. Vancouver: ACL.
    Cochran, W. G. (1954). Some methods for strengthening the common χ2 tests. Biometrics, 10, 417-451.
    Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74.
    Everitt, B. S. (1992). The analysis of contingency tables, 2nd edition. London: Chapman and Hall.
    Gao, W., & Wong, K. F. (2006). Experimental studies using statistical algorithms on transliterating phoneme sequences for English-Chinese name translation. International Journal of Computer Processing of Oriental Languages, 19(1), 63-88.
    Garside, R., Leech, G., & McEnery, A. (Eds.) (1997). Corpus annotation. New York: Longman.
    Hofland, K., & Johansson, S. (Eds). (1982). Word frequencies in British and American English. Bergen: The Norwegian Computing Centre for the Humanities.
    Ji, M. (2012). Hypothesis testing in corpus-based literary translation studies. In Oakes, M. P., & Ji, M. (Eds.), Quantitative methods in corpus-based translation studies: a practical guide to descriptive translation research (pp. 53-72). Amsterdam: John Benjamins.
    Jin, C., Na, S. H., Kim, D. I., & Lee, J. H. (2008). English-Chinese transliteration word pair extraction from parallel corpora. International Journal of Computer Processing of Oriental Languages, 21(2): 169-182.
    Johansson, S. (2003). Reflections on corpora and their uses in cross-linguistic research. In F. Zanettin, S. Bernardini & D. Stewart (Eds.), Corpora in translator education (pp. 133-144). Manchester: St Jerome.
    Kendall, M. G. (1945). The treatment of ties in ranking problems. Biometrika, 33, 239-251.
    Kilgarriff, A. (2007). Comparing corpora. In W. Teubert, & R. Krishnamurthy (Eds.), Corpus linguistics: Critical concepts in linguistics, 6(1), 232-263.
    Leech, G. (1992). Corpora and theories of linguistic performance. In J. Svartvik (Ed.), Directions in corpus linguistics (pp. 105-122). Berlin: Mouton de Gruyter.
    Leech, G., & Fallon, R. (1992). Computer corpora—what do they tell us about culture? ICAME Journal, 16, 29-50.
    Li, W. (1992). Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6), 1842-1845.
    McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies. London: Routledge.
    Oakes, M. P. (1998). Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.
    Olohan, M. (2004). Introducing corpora in translation studies. London: Routledge.
    Pym, A. (2010). Exploring translation theories. London: Routledge.
    Rayson, P., Berridge, D., & Francis, B. (2004). Extending the Cochran rule for the comparison of word frequencies between corpora. Paper presented at the 7th International Conference on Statistical analysis of textual data (JADT 2004), Louvain-la-Neuve, Belgium.
    Rayson, P., & Garside, R. (2000). Comparing Corpora Using Frequency Profiling. In: WCC '00 Proceedings of the workshop on Comparing corpora. (pp. 1-6).
    Scott, M. (2001). Mapping key words to problem and solution. In Scott, M. and Thompson, G. (eds.) Patterns of Text: in honour of Michael Hoey, Benjamins, Amsterdam, pp. 109 – 127.
    Teubert, W., & Čermáková, A. (2007). Corpus linguistics: a short introduction. London: Continuum.
    Toury, G. (1995). Descriptive translation studies—And beyond. Philadelphia: John Benjamins.
    Venuti, L. (2008). The translator's invisibility: a history of translation (2nd ed.). London: Routledge.
    Vinay, J. P., & Darbelnet, J. (1958). Stylistique comparée du français et de l'anglais. Paris: Didier-Harrap.
    Zipf, G. K. (1935). The psychobiology of language. Boston: Houghton-Mifflin.
    Zipf, G. K. (1949). Human behavior and the principle of least effort. Massachusetts: Addison-Wesley.


    “transliterate”. Oxford Dictionaries. April 2010: Oxford University Press.
    What is the BNC? In British National Corpus. Retrieved from http://www.natcorp.ox.ac.uk/corpus/index.xml
    Zipf's law. (2012, November 29). In Wikipedia, The Free Encyclopedia. Retrieved 10:41, February 17, 2013, from http://en.wikipedia.org/w/index.php?title=Zipf%27s_law&oldid=525539473
    Everson, M., McGowan, R., Whistler, K., Umamaheswaran, V. S. (2012). Roadmap to the SIP. In The Unicode Consortium. Retrieved from http://www.unicode.org/roadmaps/sip/
