簡易檢索 / 詳目顯示

研究生: 潘靜芬
Ching-Fen Pan
論文名稱: 漢語動詞語意特指之量度:語料庫為本的計量研究
Measuring the Semantic Specificity in Mandarin Verbs: A Corpus-based Quantitative Survey
指導教授: 謝舒凱
Hsieh, Shu-Kai
學位類別: 碩士
Master
系所名稱: 英語學系
Department of English
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 152
中文關鍵詞: 語意特指漢語動詞計量研究潛在語意分析
英文關鍵詞: semantic specificity, verbs in Mandarin, quantitative study, latent semantic analysis
論文種類: 學術論文
相關次數: 點閱:194下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在探討漢語動詞語意特指之量度表現。為了使中文動詞的語意內容量表化,我們首先針對一百五十個個基本動詞做人為判定標記,分為廣泛語意動詞(Generic verb) 和明確語意動詞(Specific verb) 兩種類型。藉由文獻中多種探討語意組成成分的論點,提出三項判斷標準:對施事者、施事工具的隱射,對受事類型的規範,以及語意轉化的表現。為使類型判斷標準化,本文採用語料庫語言學中所著重的量化表現包括字詞頻率、語意數量、以及受詞數量作為動詞類型的變數,再以統計學中的主成份分析(Principle Component Analysis) 判定變數的影響權重,以及多項羅吉特模型(Multinomial Logistic Model, MNLM) 為動詞類型作區別。
    此外,本文利用中央研究院平衡語料庫(Academia Sinica Balanced Corpus),建立一個詞彙分佈模型 (Distributional Model),並且利用潛在語意分析法(Latent Semantic Analysis),將動詞語意轉化為高維度向量。在以向量構成的模型中,每一個字詞在語料中的分佈,轉化為點在高維空間分佈。透過距離測量(Distance Measure) 的方式以及集群分析法(Cluster Analysis),探討詞與詞之間的相似性,以及動詞語意和詞彙間潛在的語意關連性。本研究更進一步解釋,不同的動詞類型字間差距,以及中文結果複合動詞(Chinese Resultative Verb Compound) 之語意相關性。

    The purpose of this thesis is to study semantic specificity in Chinese based on corpus-based statistical and computational methods. The analysis begins with single verbs and does primitive tests with resultative verb compounds in Chinese. The verbs studied in this work include
    one hundred and fifty head verbs collected in the M3 project. As a prerequisite, these one hundred and fifty head verbs were tagged as generic or specific type following the three criteria proposed in literatures: the specification of agent/instrument, the limitation of objects and their types, and the confinement on the action denotation to only physical action. The next step is to
    measure semantic specificity with quantitative data. To specify the use of verbs by statistics, it relies on counting the frequency, the number of senses of a verb and the range of co-occurrence objects. Two major analyses, Principle Component Analysis (PCA) and Multinomial Logistic
    Model, are adopted to assess the predictive power of variables and to predict the probability of different verb categories.
    In addition, the vector-based model in Latent Semantic Analysis (LSA) is applied to justify the concept of semantic specificity. A distributional model based on Academia Sinica Balanced Corpus (ASBC) with LSA is built to investigate the semantic space variation depending on the
    semantic specificity. By measuring the vector distance, the semantic similarity between words is calculated. The word-space model is used to measure the semantic loads of single verbs and explore the semantic information on Chinese resultative verb compounds (RVCs).

    摘要 i Abstract ii Acknowledgements iv 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . 2 1.3 Research Questions . . . . . . . . . . . . . . . .3 1.4 Organization of the Thesis . . . . . . . . . . . . 4 2 Literature Review 5 2.1 Components of Verb Meaning . . . . . . . . . . . . 5 2.2 Representation and Classification . . . . . . . . 11 2.2.1 Light versus Heavy Verbs . . . . . . . . . . . .13 2.2.2 General versus Specific Verbs . . . . . . . . . 13 2.3 Resultative Verb Compounds . . . . . . . . . . . 16 2.3.1 Aspectual Interface Hypothesis . . . . . . . . .16 2.3.2 Theta-Role Assignment . . . . . . . . . . . . . 19 2.3.3 Classification of Resultative Verb Compounds . .27 2.4 Psychological Evidence on Verbal Semantic Classification . . . . . . . . . . . . . . . . . .28 2.4.1 Perceptual and Functional Information . . . . . 29 2.4.2 Syntactic Structure Differences . . . . . . . . 29 2.4.3 Semantic Specificity . . . . . . . . . . . . . 31 2.5 Access to Word Senses in Corpus-Based Linguistics.33 2.5.1 Distributional Semantics . . . . . . .. . . . . 33 2.5.2 Syntactic Behavior . . . . . . . . . . . . . . 35 2.5.3 Co-Occurrence Word . . . . . . . . . . . . . . 36 2.5.4 Definition and Gloss . . . . . . . . . . . . . 37 2.6 Computational Model of Word Space . . . . . . . . 39 2.6.1 Vector Spaces . . . . . . . . . . . . . . . . . 40 2.6.2 Latent Semantic Analysis with Singular Value Decomposition . . . . . . . . . . . . . . . . . 41 2.6.3 Modification and Application . . . . . . . . . 45 3 Methodology 52 3.1 Data Collection . . . . . . . . . . . . . . . . .54 3.2 Criteria for Choosing the Bare Verb Form . . . . 54 3.3 Assumption of the Distinction . . . . . . . . . . 56 3.4 Manual Tagging . . . . . . . . . . . . . . . . . 57 3.5 Tagging Result and Comparison . . . . . . . . . .58 3.6 Extraction on Frequency, Sense Number, Object Number,and Object List/Type .. . . . . . . . . . .59 4 Semantic Manifestation in Quantitative Data 64 4.1 General Discussion of These Three Properties . . 64 4.1.1 The Comparison between Manual Tagging Result and the Verb List . . . . . . . . . . . . . . . . . 64 4.1.2 The Correlation Coefficients . . . . . . . . . 69 4.1.3 The Distribution of Verbs in Figures . . . . . 71 4.1.4 The Distribution of Each Variant in Plots . . . 73 4.1.5 Principal Component Analysis . . . . . . . . . 78 4.2 Prediction on Verb Types . . . . . . . . . . . . 81 4.2.1 Multinomial Regression . . . . . . . . . . . . 81 4.2.2 Generalized Linear Model . . . . . . . . . . . 83 4.2.3 Binary Logistic Regression . . . . . . . . . . 85 4.3 Summary . . . . . . . . . . . . . . . . . . . . .88 5 Word Space and Semantic Specificity in Mandarin 94 5.1 The Variation of Semantic Space between Two Verb Types (G/S) in LSA . . . . . . . . . . . . . . . .95 5.1.1 Distributional Model Based on Sinica Corpus . . 96 5.1.2 Semantic Clustering . . . . . . . . . . . . . . 99 5.1.3 Distance Variation in Small-G/S-Clusters . . . 103 5.2 Meaning Exploring on Chinese Resultative Verb Compounds (RVCs) . . . . . . . . . . . . . . . . 107 5.2.1 Compositionality and Lexical Inference in Distributional Model . . . . . . . . . . . . . 108 5.2.2 The RVC Structure in the Data . . . . . . . . 109 5.2.3 The Model Based on Verb-Event Co-Occurrence Matrix . . . . . . . . . . . . . . . . . . . . 110 5.2.4 Semantic Assessment . . . . . . . . . . . . . .112 5.3 Summary . . . . . . . . . . . . . . . . . . . . .119 6 Conclusion 120 6.1 Summary of the Thesis . .. . . . . . . . . . . . 120 6.2 Contributions . . . . . . . . . . . . . . . . . 123 6.3 Limitations of the Present Study and Suggestions for Future Research . . . . . . . . . . . . . . .123 References 127 Appendix: A Verb List 133 B Small-G-Clusters 136 C Small-S/U-Clusters 142 D Programming Code 147

    柯淑津,陳振南,黃居仁. (2004). First steps towards a fully sense-tagged chinese corpus 全語料庫中文詞義標記的初步研究. 漢語詞彙語意研究的現狀與發展趨勢國際學術研討會,北京大學.
    Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using r. Cambridge University Press.
    Berry, M., Hendrickson, B., & Raghavan, P. (1996). Sparse matrix reordering schemes for browsing hypertext. In M. S. J. Renegar & S. Smale (Eds.), Lectures in applied mathematics (lam) (Vol. 32: The Mathematics of Numerical Analysis, p. 99-123). American Mathematical Society.
    Breedin, S. D., Saffran, E. M., & Schwartz, M. F. (1998). Semantic factors in verb retrieval: An effect of complexity. Brain and Language, 63, 1-31.
    Chen, J. (2007). He cut-break the rope: Encoding and categorizing cutting and breaking events in mandarin. Cognitive Linguistics, 18(2), 273–285.
    Chen, P., Parente, M.-A., Duvignau, K., Tonietto, L., & Gaume, B. (2008). Semantic approximations in the early verbal lexicon acquisition of chinese: Flexibility against error. The 7th Workshop on Chinese Lexical Semantics.
    Cruse, D. A. (1986). Lexical semantics. Cambridge, England: University Press.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.
    Duvignau, K., Fossard, M., Gaume, B., & Pimenta, M.-A. (2005). From early lexical acquisitions to the ‘disacquisition’ of verbal lexicon: Verbal metaphor as semantic approximation. In Proceedings of the 2nd conference on the metaphor in language and thought.
    Gelman, S. A., & Tardif, T. (1998). A cross-linguistic comparison of generic noun phrases in english and mandarin. Cognition, 66(3), 215-248.
    Gentner, D. (1978). On relational meaning: the acquisition of verb meaning. Child Development, 49, 988-998.
    Gentner, D., & Boroditsky, L. (2001). Individuation, relativity and early word learning (I. M. . S. E. L. acquisition & conceptual development, Eds.). Cambridge, UK: Cambridge University Press.
    Gopnik, A., Choi, S., & Baumberger, T. (1996). Cross-linguistic differences in early semantic and cognitive development. Cognitive Development, 11, 197-227.
    Hair, J. J., Anderson, R., Tatham, R., & Black, W. (1998). Multivariate data analysis (5th ´ed.). Englewood Cliffs, NJ : Prentice-Hal.
    Hopper, P. (1991). On some principles of grammaticization. In T. . Heine (Ed.), Approaches to grammaticalization (p. 17-3). Amsterdam:Benjamins.
    Huang, C.-R., & Chen, K. jiann. (1992). A chinese corpus for linguistic research. In Proceedings of the 1992 international conference on computational linguistics (coling-92) (p. 1214-11217).
    Jackendoff, R. (1972). Semantic interpretation in generative grammar. The MIT press, Cambridge.
    Jackendoff, R. (1990). Semantic structures. Cambridge, MA: The MIT Press.
    Jespersen, O. (1933). Essentials of english grammar. London: Allen & Unwin.
    Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Prentice-Hall.
    Karlgren, J., & Sahlgren, M. (2001). From words to understanding. In K. P. Uesaka Y. & H. Asoh (Eds.), Foundations of real-world intelligence (p. 294-308).
    Ker, S.-J., & Chen, J.-N. (2004). Adaptive word sense tagging on chinese corpus. In Proceedings of 18th pacific asia conference on language, information and computation.
    Kim, M., & Thompson, C. K. (2004). Verb deficits in alzheimer’s disease and agrammatism: Implications for lexical organization. Brain and Language, 88(1), 1-20.
    Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.
    Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
    Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. From context to meaning: Distributional models of the lexicon in linguistics and cognitive science, special issue of the Italian Journal of Linguistics, 20/1, 1-31.
    Levin, B., & Hovav., M. R. (1995). Unaccusativity: At the syntax-lexical semantics interface. Linguistic Inquiry Monograph 26, MIT Press, Cambridge, MA.
    Li, W. shan. (2007). The first language influence on the second language acquisition of mandarin resultative verb compounds. M´emoire de Master non publi´e, National Taiwan Normal University.
    Li, Y. (1990). On v-v compounds in chinese. Natural Language and Linguistic Theory, 8, 177-207.
    Li, Y. (1993). Structural head and aspectuality. Language, 69.3, 480-504.
    Li, Y. (1995). The thematic hierarchy and causativity. Natural Language and Linguistic Theory, 13, 255-282.
    Li, Y. (1999). Cross-componential causativity. Natural Language and Linguistic Theory, 17, 445-497.
    Ma, W., McDonough, C., Lannon, R., Golinkoff, R. M., Hirsh-Pasek, K., & Tardif, T. (2006). A mental image is worth a thousand verbs: Imageability predicts verb learning. Jean Piaget Society, Baltimore, MD.
    Ma, Z., & Lu, J.-M. (1997). Xingrongci zuo jieguobuyu qingkuang kaocha yi (形容詞作結果補語情況考察(一)). Hanyuxuexi (漢語學習), 1, 3-7.
    Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cambridge, MA: The MIT Press.
    Portner, P. H. (2005). What is meaning: Fundamentals of formal semantics. Blackwell.
    Pustejovsky, J. (1995). The generative lexicon. MIT, Cambridge.
    Ramakrishnan, G., Prithviraj, B., & Bhattacharya, P. (2004, July). A gloss-centered algorithm for disambiguation. In R. Mihalcea&P. Edmonds (Eds.), Senseval-3: Third international workshop on the evaluation of systems for the semantic analysis of text (pp. 217–221). Barcelona, Spain : Association for Computational Linguistics.
    Saeed, J. I. (2003). Semantics (2nd ´ed.). Cambridge.
    Sahlgren, M. (2002). Random indexing of linguistic units for vector-based semantic analysis. ERCIM News, 50.
    Sahlgren, M. (2005). An introduction to random indexing. In Proceedings of the methods nd applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering (tke). Copenhagen, Denmark.
    Sahlgren, M. (2006). The word-space model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.d. dissertation, Department of Linguistics, Stockholm University.
    Smith, C. S. (1991/1997). The parameter of aspect. The Netherlands: Kluwer.
    Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4 ´ed.). New York: HarperCollins.
    Tamly, L. (1985). Lexicalization patterns: Semantic structure in lexical forms (L. t. In T. Shopen ed., syntactic description. Vol.3: Grammatical categories, & the lexicon, Eds.). New York: Cambridge University Press.
    Tardif, T. (1996). Nouns are not always learned before verbs: Evidence from mandarin speakers’ early vocabulary. Developmental Psychology, 32, 492-504.
    Tardif, T., Gelman, S., & Xu, F. (1999). Putting the “noun bias” in context: A comparison of english and mandarin. Child Development, 70, 620-635.
    Tenny, C. (1989). The aspectual interface hypothesis. In Proceedings of nels 18. Tsai, M.-C., Huang, C.-R., Chen, K.-J., & Ahrens, K. (1998). Towards a representation of verbal semantics–an approach based on near synonyms. Computational Linguistics and Chinese Language Processing, 3.1, 61-74.
    Vendler, Z. (1967). Verbs and times. In Z. Vendler (Ed.), Linguistics in philosophy (p. 97–121). Ithaca and London: Cornell University Press.
    Widdows, D. (2003). Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In Proceedings of human langauge technology / north American chapter of the association for computational linguistics.
    Widdows, D., Cederberg, S., & Dorow, B. (2002). Visualisation techniques for analyzing meaning. In Fifth international conference on text, speech and dialogue (tsd 5) (p. 107- 115).
    Widdows, D., & Ferraro, K. (2008). Semantic vectors: a scalable open source package and online technology management application. In B. M. J. M. J. O. S. P. D. T. Nicoletta Calzolari (Conference Chair) Khalid Choukri (Ed.), Proceedings of the sixth international language resources and evaluation (lrec’08). Marrakech, Morocco : European Language Resources Association (ELRA).
    Woods, A., Fletcher, P., & Hughes, A. (1986). Statistics in language studies. Cambridge: Cambridge University Press.
    Yarowsky, D. (1993). One sense per collocation. In Proceedings arpa human language technology workshop (p. 266-271.). Princeton, N.
    Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley.

    下載圖示
    QR CODE