國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林淑晏 Shu-Yen Lin
論文名稱：	以計算語言學方法研究英文的認知基本名詞 A Computational Study of the Basic Level Nouns in English
指導教授：	畢永峨 Biq, Yung-O 謝舒凱 Hsieh, Shu-Kai
學位類別：	博士 Doctor
系所名稱：	英語學系 Department of English
論文出版年：	2010
畢業學年度：	98
語文別：	英文
論文頁數：	311
中文關鍵詞：	原型理論、認知語言學、計算語言學、認知基本層名詞、英語詞網、英語詞彙計畫
英文關鍵詞：	prototype theory, cognitive linguistics, computational linguistics, basic level nouns, WordNet, English Lexicon Project
論文種類：	學術論文
相關次數：	點閱：451 下載：18
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文探討認知科學中相當著名的原型理論（Prototype Theory）長久以來一直存在的一個議題，研究認知分類的文獻多是倚賴一些少數經典的例證，像是「扶手椅」、「椅子」、「傢俱」等的例子（Rosch et al. 1976; Taylor 2003; Ungerer & Schmid 1996, 2006）。就本作者所知，至今尚無任何研究試圖分析任一語言中所有詞彙的認知層（superordinate level, basic level, subordinate level），本論文以大型電子資料庫（WordNet, CELEX, BNC, CHILDES, ELP）為底，對英語的所有名詞進行全面性的研究，為羅須等人（Rosch et al. 1976, 1978）所提出的認知分類理論提供了有力的實證。本作者設計了一個找出WordNet裡的英文名詞認知層的計算法，比較每一個名詞在其所處的層級鍊中與其他名詞在形成複合詞能力上的相互關係，自動偵測出每個名詞的認知層級。
以上述方法所擷取的英文名詞在詞彙、語意、構詞等各方面都有明顯的數據可呼應我們以三個認知層的認知顯著性差別所做的各種預測，尤其是以多元回歸（multiple regression）分析詞彙判別時間差（lexical decision latency）的實驗結果顯示，利用本論文所提出的計算法找出的認知層與詞彙判斷之間有很高的關聯性，這些數據上的實證對於本論文所提出的計算法的效度以及原型理論的可信度都是強力的佐證。
分析母語習得的語料也達到與上述相同的結論，幼兒學習基本層詞彙的速度與詞彙量遠大於其他兩個認知層的詞彙，上層詞對幼兒而言特別具挑戰性，但一旦習得了的上層詞就成為幼兒常用的詞彙。
由本論文的研究結果可看出認知科學與計算科學是可緊密聯繫且齊頭並進的。

As a celebrated theory in cognitive linguistics, Prototype Theory faces the long-standing issue that studies of cognitive categorization have often resorted to just a few typical cases exemplified by ‘armchair’ - ‘chair’ - ‘furniture’ and the alike (Rosch et al. 1976; Taylor 2003; Ungerer & Schmid 1996, 2006). To my knowledge, so far there have been no attempts to pin down the cognitive levels of all the lexical words in any language. This study provides support to the cognitive categorization proposed by Rosch et al. (1976, 1978) with a general study on all lexical nouns in English based on large electronic databases (WordNet, CELEX, BNC, CHILDES, ELP). A computational algorithm is suggested for automatically identifying the cognitive levels of the nouns in WordNet by deducing its ability to form critical compounds in virtue of a contrast to the other words in hierarchical chains.
The nouns we extract demonstrate distinctive numerical features in lexical, semantic, and morphological aspects in accordance with the predictions deduced from the demarcation between the cognitive saliencies of the three different levels. In particular, it is shown by multiple regression analysis that lexical decision latencies are highly correlated with the cognitive levels assigned by our algorithm. The empirical evidence provides strong support for both the validity of the level-assignment and the substantiality of Prototype Theory.
First language acquisition data also support the conclusion reached above. Young children acquire basic level words at a significantly faster speed in strikingly larger volume. Superordinate level words are particularly challenging for young learners, but once they are acquired, they are very frequent linguistic items.
The thesis has been a manifestation that cognitive science and computer science can well go hand-in-hand.

摘要    i
Abstract    ii
Acknowledgements    iii
List of Tables    viii
List of Figures    xi

Introduction    1

Prototype Theory and Its Implications in Other Disciplines of Linguistics    8
1  Classical Categorization Theories and the Early Reactions    8
1.1  Classical Approach to Categorization    8
1.2  Family Resemblance    9
1.3  The ‘Cup’ vs. ‘Bowl’ Experiments     10
1.4  Structuralism and Color Terms     10
1.5  Basic Color Terms    11
2  Prototype Theory    12
2.1  Prototypical Effects     13
2.2  Basic Level Terms    14
3  Implication of Cognitive Categorization in Other Disciplines of Linguistics    17

Data Preparation and Preprocessing    19
1  WordNet    19
1.1  Design and Content of WordNet    20
1.2  Nominal Hierarchical Chains in WordNet    23
1.3  Nominal Compounds in WordNet    39
1.3.1  Identifying the Heads and Modifiers of Bi-Component Spaced/Hyphenated Compounds in WordNet    44
1.3.2  Identifying the Heads and Modifiers of Multiple-Component Spaced/Hyphenated Compounds in WordNet    53
1.3.3  Identifying the Compounded Heads and Modifiers of Multiple-Component Spaced/Hyphenated Compounds in WordNet     56
2  CELEX    58
2.1  Nominal Compounds in CELEX    58
2.2  Extraction of Compounds and Derivational Words from CELEX    60
3  British National Corpus    67
3.1  Noun Tagging in the BNC    67
3.2  Noun Lemmatization and the BNC    69
3.3  Spelling Conventions and the BNC    71
3.4  Frequencies of Nominal Compounds in the BNC    73
3.5  Contextual Diversity of the Nouns in the BNC     75
3.6  Verb Frequencies in the BNC    79
3.7  Adjective Frequencies in the BNC    80
4  Child Language Data Exchange System (CHILDES)    81
5  English Lexicon Project (ELP)    82
5.1  Drawbacks of the Design of Typical Lexical Processing Studies    82
5.2  Design and Content of the ELP     84

Identification of the Cognitive Hierarchical Levels of English Nouns    89
1  Previous Algorithm to Identity Cognitive Levels of Nouns    90
1.1  Experiment 1 of our previous work (Lin et al. 2009)    91
1.2  Experiment 2 of our previous work (Lin et al. 2009)    93
1.3  Previous Algorithm for Identifying Basic Level Words    97
2  Why the Previous Algorithm Should Be Modified    99
2.1  What Variables Should Be Included in the Algorithm     99
2.2  How Can the Compound Ratio Threshold be Pinpointed    103
2.3  The Relativity between Hyponymous Compounds and Hyponyms    105
2.4  How to Tag the Sense Number of a Word    106
2.5  Why Not Take Account of Hierarchical Chains     107
3  A New Algorithm for Identifying the Cognitive Levels of the Nouns in WordNet     107
3.1  Do We Need More Variables in the Algorithm?     107
3.2  Fuzziness as a Categorization Principle in Folk Taxonomy     110
3.3  An Advanced Algorithm for Identifying the Cognitive Levels of the Nouns in WordNet    111
3.3.1  Compound Formation Is the Most Reliable Formal Index of Cognitive Levels     112
3.3.2  Critical Compounds    114
3.3.3  The Formula Which Calculates the Compound Ratio     116
3.3.4  The Hierarchical Chains Play a Significant Role    125

Results and Assessment of the Cognitive Level Assignment    133
1  Statistical Analysis of the Nouns at the Three Cognitive Levels    133
1.1  Lexical Characteristics of the Three Cognitive Levels     134
1.1.1  Synsets, Hyponyms, Word Length, and Word Length Difference    134
1.1.2  Word Classes and Derivational Words    143
1.2  Morphological Characteristics of the Three Cognitive Level Words    144
1.3  Word Frequencies and parts of speech    150
1.3.1  Frequencies of noun-noun compounds     150
1.3.2  Frequencies in various parts of speech     154
2  Cognitive Level and Experimental Behavior    161
2.1  Multiple Regression Model of Lexical Decision Latency in the ELP    164
2.1.1  Collinearity and Principal Component Analysis    164
2.1.2  Nonlinearity and Cubic Splines    170

General Discussion    176
1  Language Acquisition and Cognitive Categorization    176
2  Future Research    182

Summary    185

References    201

Appendix A:  In-plural-form nominal entries in WordNet and their singular counterparts    211
Appendix B:  Equivalent Spellings in WordNet    214
Appendix C:  Semi-automatically extracted equivalent spellings in WordNet    275
Appendix D:  Basic level words identified in this study ordered by contextual diversity    277
Appendix E:  Superordinates identified in this study ordered by contextual diversity
298
                                

Adelman, J. S., Brown, G. D. A., and Quesada, J. F. (2006) Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17(9), 814-823.
Andrews, S. (1997). The effect of orthographic similarity on lexical retrieval: resolving neighborhood conflicts. Psychonomic Bulletin & Review, 4, 439-461.
Aristotle (1933). Metaphysics. trans. H. Tredennick. London: Heinemann.
Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., and Yap, M. J. (2004) Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133(2), 283-316.
—— Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., and Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445-459.
—— Cortese, M., and Pilotti, M. (1999). Visual lexical decision latencies for 2906 words. Available: <http:www.artsci.wustl.edu/~dbalota/lexical_decision.html/>
Baayen, R. H. (2008). Analyzing Linguistic Data. A Practical Introduction to Statistics Using R. Cambridge University Press.
—— Feldman, L. B., and Schreuder. R. (2006) Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language, 55, 290-313.
—— Piepenbrock, R., and Gulikers, L. (1995). The CELEX lexical database (CD-ROM) Linguistic Data Consortium. Philadelphia, PA: Pennsylvania.
Belsley, D. A., Kuh, E., and Welsch, R. E. (1980). Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics. Wiley, New York.
Berlin, B. (1972). Speculations on the growth of ethnobotanical nomenclature. Language in Society, 1, 51-86.
—— and Kay, P. (1969). Basic Color Terms: Their Universality and Evolution. Berkeley: University of California Press.
Bloomfield, L. (1933). Language. London: George Allen & Unwin.
British National Corpus (BNC). http://www.natcorp.ox.ac.uk/
Brown, R. (1958). How shall a thing be called? Psychological Review 65 (1), 14-21.
—— (1965). Social Psychology. The Free Press. NY.
—— and E. H. Lenneberg. (1954). A study in language and cognition. Journal of Abnormal and Social Psychology 49, 454-462.
Brysbaert, M., and B. New. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977-990.
Bulmer, R. (1967). Why is the cassowary not a bird? A problem of zoological taxonomy among the Karam of the New Guinea Highlands. Man: the Journal of the Royal Anthropological Institute, 2, 5-25.
—— and Tyler, M. J. (1968). Karam classification of frogs. The Journal of the Polynesian Society, 77, 333-385.
Cannon, G. (1987). Historical Change and English Word-formation. New York: Lang.
Carroll, J. B. (1956). Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. MIT Press.
Child Study Committee of the International Kindergarten Union. (M. D. Horn, chairman.) (1928). A Study of the Vocabulary of Children before Entering the First Grade. Baltimore, Williams and Wilkins.
Coltheart, M., Patterson, K., & Marshall, J. C. (1980). Deep dyslexia. London: Routledge & Kegan Paul.
—— Davelaar, E., Jonasson, J., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and Performance VI (pp. 535-555). Hillsdale, NJ: Erlbaum.
—— Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204-256.
Costello, F. J. (1996). Noun-noun Conceptual Combination: the Polysemy of Compound Phrases. PhD Thesis, University of Dublin, Ireland.
Costello, F. J., Veale, T. & Dunne, S. (2006). Using WordNet to automatically deduce relations between words in noun-noun compounds. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 160-167.
Cruse, D. A. (1977). The pragmatics of lexical specificity. Journal of Linguistics 13, 153-164.
Deane, Paul D. (1992). Grammar in Mind and Brain: Explorations in Cognitive Syntax. Berlin: Mouton de Gruyter.
Dirven, R. and Taylor, J. (1988). The conceptualization of vertical space in English: The case of ‘tall’. In Rudzka-Ostyn, 379-402.
—— and Verspoor, M. (2004). Cognitive Exploration of Language and Linguistics. Amsterdam: John Benjamins.
Downing, P. (1977). On the creation and use of English compound nouns. Language, 53, 810-842.
English Lexicon Project (ELP). http://elexicon.wustl.edu/default.asp
Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: The MIT press.
Forster, K. I. (2000). The potential for experimenter bias effects in word recognition experiments. Memory & Cognition, 28, 1109–1115.
Garside, R.G., Leech, G.N., and Sampson, G.R. (eds) (1987). The Computational Analysis of English: A Corpus-based Approach. Longman, London.
Geeraerts, D., Grondelaers, S., & Bakema P. (1994). The Structure of Lexical Variation: Meaning, Naming, and Context. Oxford University Press.
—— and Cuyckens, H. (2007). The Oxford Handbook of Cognitive Linguistics (Oxford Handbooks). Oxford University Press.
Grainger, J., & Jacobs, A. M. (1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 103, 518-565.
Harrell, F. (2001). Regression modeling strategies. Berlin: Springer.
Harris, R. (1983). F. de Saussure: Course in General Linguistics. London: Duckworth. English translation of Saussure (1964).
Heider, E. R. (= Rosch) (1971). “Focal” color areas and the development of color names. Developmental Psychology, 4: 447-455.
—— (1972). Universals in color naming and memory. Journal of Experimental Psychology, 93:10-20.
Hoenkamp, E.C.M. (2005). Why information retrieval needs cognitive science: A call to arms. Proceedings of the 27th Annual Conference of the Cognitive Science Society.
Jared, D., McRae, K., & Seidenberg, M. S. (1990). The basis of consistency effects in word naming. Journal of Memory & Language, 29, 687-715.
Jurafsky, D. & Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. London: Pearson Prentice Hall.
Katz, L., and Feldman, L. B. (1983). Relation between pronunciation and recognition of printed words in deep and shallow orthographies. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 157–166.
Keane, M., & Costello, F. (1997). Where do "soccer moms" come from? Cognitive constraints on noun-noun compounding in English. In Proceedings of Mind II: Computational Models of Creative Cognition, Dublin, Ireland.
Khmaladze, E. V. (1987). The Statistical Analysis of Large Number of Rare Events. Technical report MS-R8804, department of mathematical statistics, CWI, Amsterdam, Netherlands.
Kristiansen, G., Achard, M., Dirven, R., and Ibáñez, F. (2006). (Eds.) Cognitive Linguistics: Current Applications and Future Perspectives. Mouton de Gruyter. Berlin.
Kučera, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
Labov, W. (1973). The boundaries of words and their meanings. In Bailey and Shuy (Eds.), New Ways of Analyzing Variation in English, 340-373.
Landauer, Foltz, & Laham. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
Leech, G., Rayson, P, & Wilson, A. (2001). Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman.
Lenneberg, E. H. (1967). Biological Foundations of Language. New York: Wiley.
Lipka, L. (1980). Methodology and representation in the study of lexical fields. In Kastovsky, D. (Ed.) Perspektiven der Lexikalisher Semantik: Beiträge zum Wuppertaler Semantikkolloquium vom 2-3.12.1977 93-144. Bonn: Bouvier.
Lin, S.-Y., Su, C.-C., Lai, Y.-D., Yang, L.-C., and Hsieh, S.-K. (2009). Assessing text
readability using hierarchical lexical relations retrieved from WordNet. Computational Linguistics & Chinese Language Processing, 14(1), 45-83.
Lund, K., & Burgess, C. (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28, 203-208.
Lupker, S. J. (1979). The semantic nature of response competition in the picture-word interference task. Memory & Cognition, 7, 485–495.
Lyons, J. (1968). Introduction to Theoretical Linguistics. Cambridge: Cambridge University Press.
MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40.
Marshall, I. (1983). Choice of Grammatical Word-class without Global Syntactic Analysis: Tagging Words in the LOB Corpus, Computers and the Humanities, 17: 139-50.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.
Melvin, J. Y., and Balota, D. A. (2009). Visual word recognition of multisyllabic words. Journal of Memory and Language, 60(4), 502-529.
Mervis, C. B., and Crisafi, M. A. (1982). Order of acquisition of subordinate-, basic- and superordinate-level categories. Child Development, 53, 258-266.
Miller, G., Fellbaum, C., Kegl, J., and Miller K. (1988). WordNet: an electronic lexical reference system based on theories of lexical memory. Revue québécoise de linguistique, 17(2), 181-212.
Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General, 106, 226-254.
NLTK (Natural Language Toolkit) http://www.nltk.org/Home
Paulman, S. G. (1983). Word Meaning and Belief. London: Croom Helm.
Perfetti, C. A. (1994). Psycholinguistics and reading ability. In M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 849–894). San Diego, CA: Academic Press.
Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1989). Positron emission tomographic studies of the processing of single words. Journal of Cognitive Neuroscience, 1, 153-170.
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., and Patterson, K. E. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103,56–115.
R (The R Project for Statistical Computing). http://www.r-project.org/
Razmerita, L., Angehrn, A., & Maedche, A. 2003. Ontology-Based User Modeling for Knowledge Management Systems. In: Lecture Notes in Computer Science: 213–17.
Rifkin, A. (1985). Evidence for basic level in event taxonomies. Memory & Cognition, 13, 538-556.
Rosaldo, M. Z. (1972). Metaphors and folk classification. Southwestern Journal of Anthropology, 28, 83-99.
Rosch, E. (1973b). On the internal structure of perceptual and semantic categories. In Moore (ed.), 111-144.
—— (1975b). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104: 192-233.
—— (1976). Structural bases of typicality effects. Journal of Experimental Psychology: Human Perception and Performance, 2: 491-502.
—— (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds.), Cognition and Categorization, Social Science Research Council (U.S.).
—— Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382-439.
Saussure, F. DE (1964). Cours de linguistique générale, 3rd ed. C. Bally and A. Sechehaye (eds.), Paris: Payot. 1st ed. 1916.
Schmid, H.-J. (2007). Measuring the relative entrenchment and salience of categories
in lexical taxonomies. In Geeraerts & Cuyckens (Eds.) The Oxford Handbook of Cognitive Linguistics (Oxford Handbooks). Oxford University Press.
—— (1996b). Review of: Geeraerts, Dirk, Stefan Grondelaers and Peter Bakema
1994, The structure of lexical variation. Meaning, naming, and context, Berlin: Mouton de Gruyter. Lexicology 2/1, 78-84.
—— (1996a). Basic level categories as basic cognitive and linguistics building blocks. In Weigand, E. & Hundsnurscher, F. (Eds.) Lexical Structures and Language Use 1: 285-295. Tübingen: Max Niemeyer.
Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523–568.
Smith, M. E. (1926). An investigation of the development of the sentence and the extent of vocabulary in young children. University of Iowa Studies in Child Welfare. Iowa City, Iowa.
Soylu, A., De Causmaecker, Patrick. (2009). Merging model driven and ontology driven system development approaches pervasive computing perspective. in Proc 24th Intl Symposium on Computer and Information Sciences. pp 730–735.
Spieler, D. H., & Balota, D. A. (1997). Bringing computational models of word naming down to the item level. Psychological Science, 8, 411–416.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662.
Taylor, J. R. (2003). Linguistic Categorization. Oxford University Press.
Underwood, B. J. (1961). Ten years of massed practice on distributed practice. Psychological Review 68 (4), 229-247.
Ungerer, F. (2001). Basicness and conceptual hierarchies in foreign language learning: a corpus-based study. In M. Pütz, S. Niemeier, & R. Dirven. (Eds.) Applied Cognitive Linguistics: Language Pedagogy, 201-224.
—— and Schmid, H.-J. (1998). Englishe Komposita und Kategorisierung. Rostocker Beiträge zur Sprachwissenschaft 5: 77-98.
—— and Schmid, H.-J. (1996). An Introduction to Cognitive Linguistics. London/New York, Longman.
——and Schmid, H.-J. (2006). An Introduction to Cognitive Linguistics (2nd ed.). London/New York, Longman.
Verschueren, J. (1985). What people say they do with words. In Prolegomena to an Empirical-Conceptual Approach to Linguistic Action. Norwood, NJ: Ablex Publishing Corporation.
Wittgenstein, L. (1978). Philosophical Investigations. Translated by G. E. M. Anscombe. Oxford: Basil Blackwell.
Wood, S. N (2006). Generalized additive models: An introduction with R. New York: Chapman & Hall/CRC.
WordNet, version 3.0. (2006). Princeton, N.J.: Princeton University. Retrieved from World Wide Web: http://wordnet.princeton.edu/wordnet/download/#nix.
Yudelson, M., Gavrilova, T., & Brusilovsky, P. 2005. Towards User Modeling Meta-ontology. Lecture Notes in Computer Science, 3538: 448.
Zorzi, M., Houghton, G., & Butterworth, B. (1998). Two routes or one in reading aloud? A connectionist dual-process model. Journal of Experimental Psychology: Human Perception & Performance, 24, 1131-1161.

簡易檢索 / 詳目顯示

相關論文