研究生: |
潘靜芬 Ching-Fen Pan |
---|---|
論文名稱: |
漢語動詞語意特指之量度:語料庫為本的計量研究 Measuring the Semantic Specificity in Mandarin Verbs: A Corpus-based Quantitative Survey |
指導教授: |
謝舒凱
Hsieh, Shu-Kai |
學位類別: |
碩士 Master |
系所名稱: |
英語學系 Department of English |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 152 |
中文關鍵詞: | 語意特指 、漢語動詞 、計量研究 、潛在語意分析 |
英文關鍵詞: | semantic specificity, verbs in Mandarin, quantitative study, latent semantic analysis |
論文種類: | 學術論文 |
相關次數: | 點閱:194 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在探討漢語動詞語意特指之量度表現。為了使中文動詞的語意內容量表化,我們首先針對一百五十個個基本動詞做人為判定標記,分為廣泛語意動詞(Generic verb) 和明確語意動詞(Specific verb) 兩種類型。藉由文獻中多種探討語意組成成分的論點,提出三項判斷標準:對施事者、施事工具的隱射,對受事類型的規範,以及語意轉化的表現。為使類型判斷標準化,本文採用語料庫語言學中所著重的量化表現包括字詞頻率、語意數量、以及受詞數量作為動詞類型的變數,再以統計學中的主成份分析(Principle Component Analysis) 判定變數的影響權重,以及多項羅吉特模型(Multinomial Logistic Model, MNLM) 為動詞類型作區別。
此外,本文利用中央研究院平衡語料庫(Academia Sinica Balanced Corpus),建立一個詞彙分佈模型 (Distributional Model),並且利用潛在語意分析法(Latent Semantic Analysis),將動詞語意轉化為高維度向量。在以向量構成的模型中,每一個字詞在語料中的分佈,轉化為點在高維空間分佈。透過距離測量(Distance Measure) 的方式以及集群分析法(Cluster Analysis),探討詞與詞之間的相似性,以及動詞語意和詞彙間潛在的語意關連性。本研究更進一步解釋,不同的動詞類型字間差距,以及中文結果複合動詞(Chinese Resultative Verb Compound) 之語意相關性。
The purpose of this thesis is to study semantic specificity in Chinese based on corpus-based statistical and computational methods. The analysis begins with single verbs and does primitive tests with resultative verb compounds in Chinese. The verbs studied in this work include
one hundred and fifty head verbs collected in the M3 project. As a prerequisite, these one hundred and fifty head verbs were tagged as generic or specific type following the three criteria proposed in literatures: the specification of agent/instrument, the limitation of objects and their types, and the confinement on the action denotation to only physical action. The next step is to
measure semantic specificity with quantitative data. To specify the use of verbs by statistics, it relies on counting the frequency, the number of senses of a verb and the range of co-occurrence objects. Two major analyses, Principle Component Analysis (PCA) and Multinomial Logistic
Model, are adopted to assess the predictive power of variables and to predict the probability of different verb categories.
In addition, the vector-based model in Latent Semantic Analysis (LSA) is applied to justify the concept of semantic specificity. A distributional model based on Academia Sinica Balanced Corpus (ASBC) with LSA is built to investigate the semantic space variation depending on the
semantic specificity. By measuring the vector distance, the semantic similarity between words is calculated. The word-space model is used to measure the semantic loads of single verbs and explore the semantic information on Chinese resultative verb compounds (RVCs).
柯淑津,陳振南,黃居仁. (2004). First steps towards a fully sense-tagged chinese corpus 全語料庫中文詞義標記的初步研究. 漢語詞彙語意研究的現狀與發展趨勢國際學術研討會,北京大學.
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using r. Cambridge University Press.
Berry, M., Hendrickson, B., & Raghavan, P. (1996). Sparse matrix reordering schemes for browsing hypertext. In M. S. J. Renegar & S. Smale (Eds.), Lectures in applied mathematics (lam) (Vol. 32: The Mathematics of Numerical Analysis, p. 99-123). American Mathematical Society.
Breedin, S. D., Saffran, E. M., & Schwartz, M. F. (1998). Semantic factors in verb retrieval: An effect of complexity. Brain and Language, 63, 1-31.
Chen, J. (2007). He cut-break the rope: Encoding and categorizing cutting and breaking events in mandarin. Cognitive Linguistics, 18(2), 273–285.
Chen, P., Parente, M.-A., Duvignau, K., Tonietto, L., & Gaume, B. (2008). Semantic approximations in the early verbal lexicon acquisition of chinese: Flexibility against error. The 7th Workshop on Chinese Lexical Semantics.
Cruse, D. A. (1986). Lexical semantics. Cambridge, England: University Press.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.
Duvignau, K., Fossard, M., Gaume, B., & Pimenta, M.-A. (2005). From early lexical acquisitions to the ‘disacquisition’ of verbal lexicon: Verbal metaphor as semantic approximation. In Proceedings of the 2nd conference on the metaphor in language and thought.
Gelman, S. A., & Tardif, T. (1998). A cross-linguistic comparison of generic noun phrases in english and mandarin. Cognition, 66(3), 215-248.
Gentner, D. (1978). On relational meaning: the acquisition of verb meaning. Child Development, 49, 988-998.
Gentner, D., & Boroditsky, L. (2001). Individuation, relativity and early word learning (I. M. . S. E. L. acquisition & conceptual development, Eds.). Cambridge, UK: Cambridge University Press.
Gopnik, A., Choi, S., & Baumberger, T. (1996). Cross-linguistic differences in early semantic and cognitive development. Cognitive Development, 11, 197-227.
Hair, J. J., Anderson, R., Tatham, R., & Black, W. (1998). Multivariate data analysis (5th ´ed.). Englewood Cliffs, NJ : Prentice-Hal.
Hopper, P. (1991). On some principles of grammaticization. In T. . Heine (Ed.), Approaches to grammaticalization (p. 17-3). Amsterdam:Benjamins.
Huang, C.-R., & Chen, K. jiann. (1992). A chinese corpus for linguistic research. In Proceedings of the 1992 international conference on computational linguistics (coling-92) (p. 1214-11217).
Jackendoff, R. (1972). Semantic interpretation in generative grammar. The MIT press, Cambridge.
Jackendoff, R. (1990). Semantic structures. Cambridge, MA: The MIT Press.
Jespersen, O. (1933). Essentials of english grammar. London: Allen & Unwin.
Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Prentice-Hall.
Karlgren, J., & Sahlgren, M. (2001). From words to understanding. In K. P. Uesaka Y. & H. Asoh (Eds.), Foundations of real-world intelligence (p. 294-308).
Ker, S.-J., & Chen, J.-N. (2004). Adaptive word sense tagging on chinese corpus. In Proceedings of 18th pacific asia conference on language, information and computation.
Kim, M., & Thompson, C. K. (2004). Verb deficits in alzheimer’s disease and agrammatism: Implications for lexical organization. Brain and Language, 88(1), 1-20.
Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. From context to meaning: Distributional models of the lexicon in linguistics and cognitive science, special issue of the Italian Journal of Linguistics, 20/1, 1-31.
Levin, B., & Hovav., M. R. (1995). Unaccusativity: At the syntax-lexical semantics interface. Linguistic Inquiry Monograph 26, MIT Press, Cambridge, MA.
Li, W. shan. (2007). The first language influence on the second language acquisition of mandarin resultative verb compounds. M´emoire de Master non publi´e, National Taiwan Normal University.
Li, Y. (1990). On v-v compounds in chinese. Natural Language and Linguistic Theory, 8, 177-207.
Li, Y. (1993). Structural head and aspectuality. Language, 69.3, 480-504.
Li, Y. (1995). The thematic hierarchy and causativity. Natural Language and Linguistic Theory, 13, 255-282.
Li, Y. (1999). Cross-componential causativity. Natural Language and Linguistic Theory, 17, 445-497.
Ma, W., McDonough, C., Lannon, R., Golinkoff, R. M., Hirsh-Pasek, K., & Tardif, T. (2006). A mental image is worth a thousand verbs: Imageability predicts verb learning. Jean Piaget Society, Baltimore, MD.
Ma, Z., & Lu, J.-M. (1997). Xingrongci zuo jieguobuyu qingkuang kaocha yi (形容詞作結果補語情況考察(一)). Hanyuxuexi (漢語學習), 1, 3-7.
Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cambridge, MA: The MIT Press.
Portner, P. H. (2005). What is meaning: Fundamentals of formal semantics. Blackwell.
Pustejovsky, J. (1995). The generative lexicon. MIT, Cambridge.
Ramakrishnan, G., Prithviraj, B., & Bhattacharya, P. (2004, July). A gloss-centered algorithm for disambiguation. In R. Mihalcea&P. Edmonds (Eds.), Senseval-3: Third international workshop on the evaluation of systems for the semantic analysis of text (pp. 217–221). Barcelona, Spain : Association for Computational Linguistics.
Saeed, J. I. (2003). Semantics (2nd ´ed.). Cambridge.
Sahlgren, M. (2002). Random indexing of linguistic units for vector-based semantic analysis. ERCIM News, 50.
Sahlgren, M. (2005). An introduction to random indexing. In Proceedings of the methods nd applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering (tke). Copenhagen, Denmark.
Sahlgren, M. (2006). The word-space model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.d. dissertation, Department of Linguistics, Stockholm University.
Smith, C. S. (1991/1997). The parameter of aspect. The Netherlands: Kluwer.
Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4 ´ed.). New York: HarperCollins.
Tamly, L. (1985). Lexicalization patterns: Semantic structure in lexical forms (L. t. In T. Shopen ed., syntactic description. Vol.3: Grammatical categories, & the lexicon, Eds.). New York: Cambridge University Press.
Tardif, T. (1996). Nouns are not always learned before verbs: Evidence from mandarin speakers’ early vocabulary. Developmental Psychology, 32, 492-504.
Tardif, T., Gelman, S., & Xu, F. (1999). Putting the “noun bias” in context: A comparison of english and mandarin. Child Development, 70, 620-635.
Tenny, C. (1989). The aspectual interface hypothesis. In Proceedings of nels 18. Tsai, M.-C., Huang, C.-R., Chen, K.-J., & Ahrens, K. (1998). Towards a representation of verbal semantics–an approach based on near synonyms. Computational Linguistics and Chinese Language Processing, 3.1, 61-74.
Vendler, Z. (1967). Verbs and times. In Z. Vendler (Ed.), Linguistics in philosophy (p. 97–121). Ithaca and London: Cornell University Press.
Widdows, D. (2003). Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In Proceedings of human langauge technology / north American chapter of the association for computational linguistics.
Widdows, D., Cederberg, S., & Dorow, B. (2002). Visualisation techniques for analyzing meaning. In Fifth international conference on text, speech and dialogue (tsd 5) (p. 107- 115).
Widdows, D., & Ferraro, K. (2008). Semantic vectors: a scalable open source package and online technology management application. In B. M. J. M. J. O. S. P. D. T. Nicoletta Calzolari (Conference Chair) Khalid Choukri (Ed.), Proceedings of the sixth international language resources and evaluation (lrec’08). Marrakech, Morocco : European Language Resources Association (ELRA).
Woods, A., Fletcher, P., & Hughes, A. (1986). Statistics in language studies. Cambridge: Cambridge University Press.
Yarowsky, D. (1993). One sense per collocation. In Proceedings arpa human language technology workshop (p. 266-271.). Princeton, N.
Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley.