簡易檢索 / 詳目顯示

研究生: 鄭郇
Cheng, Hsun
論文名稱: 圖形化單字量測驗之差別試題功能分析:一個社會文化探究
Detecting DIF of a Pictorial Vocabulary Size Test: A Social-Cultural Inquiry
指導教授: 曾文鐽
Tseng, Wen-Ta
學位類別: 碩士
Master
系所名稱: 英語學系
Department of English
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 173
中文關鍵詞: 單字量公平文化資本差別試題功能
英文關鍵詞: vocabulary size, fairness, cultural capital, DIF
論文種類: 學術論文
相關次數: 點閱:105下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在探討台灣國中學生在圖形化單字量測驗之區域文化資本差別試題功能(Differential Item Functioning,簡稱DIF)。同時,藉由進一步分析造成顯著差異之試題選項的原因,來確保圖形化單字量測驗的公平性,期能提供現職國中英文教師適合的單字測驗來檢測國中學生基礎一千兩百個單字的單字量。
    受試對象為1393位國中學生,分別來自都會及鄉村地區,在施行兩次圖形化單字量測驗後,以DIFAS、Winsteps、RUMM等三種統計套裝軟體分析結果並挑出共同之嚴重區域DIF試題,測驗後從中挑選出六位受試者進行訪問,請他們針對DIF試題說明他們作答時的想法。最後針對具有區域DIF的試題進行分析,並對於可能原因進行討論。
    圖形化單字量測驗區域DIF分析結果顯示:有DIF現象的試題題目出現率為5.6%,DIF題數量雖不多,但區域DIF與不同文化資本和試題特徵間似有關聯;當試題中含有與受試者所具有的生活經驗和文化資本有特定相關的圖片時,其DIF試題出現比例偏高。
    研究者建議命題者於出圖形化單字量測驗試題時,應盡可能避免特定生活情境出現,因此試題中的圖片內容應盡量避免涉及特定區域的事物、場所、景觀、生活經驗等,以維持試題品質及測驗的公平性。

    關鍵字:單字量、公平、文化資本、差別試題功能

    The present study aims to investigate the differential item functioning (DIF) of cultural capital on Pictorial Vocabulary Size Test (PVST) for junior high school students. This study also identifies the reasons account for such DIF phenomena, and ensures the fairness of PVST. It is hoped that an appropriate vocabulary size test for junior high school teachers is provided.
    Participants were 1,393 junior high school students from different urban and rural areas of Taiwan. DIF analyses were conducted to identify test items that account for the DIF phenomena. DIFAS, Winsteps, and RUMM are three statistical programs adopted to identify large and moderate DIF items. Six participants were selected as the interviewees to describe their thinking process during the test. The study also discussed possible causes of DIF.
    The analyses of district DIF on PVST indicated that the average percentage of items displaying district DIF was low, at about 5.6%. There appeared to be a correlation among district DIF, different cultural capital, and item characteristics. When the pictures were related to particular life experiences and specific cultural capital groups, district DIF was demonstrated.
    It is suggested that PVST item writers should avoid the design of items associated with particular life settings. Therefore, the use of pictures in the test should avoid specific objects, places, scenes, life experiences and so on, so as to maintain the quality of test items and to ensure the fairness of the test.

    Keywords: vocabulary size, fairness, cultural capital, DIF

    TABLE OF CONTENTS CHINESE ABSTRACT i ENGLISH ABSTRACT ii ACKNOWLEDGEMENT iii TABLE OF CONTENTS iv LIST OF TABLES vi LIST OF FIGURES vii CHAPETER ONE INTRODUCTION 1 Background and Motivation 1 Purpose of the Study 7 Research questions 8 Significance of the Present Study 8 Organization of the Thesis 9 CHAPTER TWO LITERATURE REVIEW 10 Aspects of Vocabulary 10 The Importance of Vocabulary Knowledge in Language Learning 10 Ways of Measuring Vocabulary 17 Fairness in Language Testing 22 Reliability and Validity 23 Differential Item Functioning (DIF) and Bias 29 Social Impact of Fairness in Language Learning and Testing 32 Urban and Rural Differences in Language Learning and Testing 36 Item Response Theory 39 1PL, 2PL, and 3PL IRT Models 42 CHAPTER THREE METHODOLOGY 49 Participants 49 Instruments 50 The Pictorial Vocabulary Size Test 51 Follow-up Interviews 52 Procedures 53 The Pictorial Vocabulary Size Test 53 Follow-up Interviews 55 Data Analysis 56 CHAPTER FOUR RESULTS AND DISCUSSION 59 DIF Analysis 59 Synthesis Discussion 73 CHAPTER FIVE CONCLUSION 118 Summary of Major Findings 118 Pedagogical Implications 121 Limitations 123 Suggestions for Future Research 124 References 125 Appendix 1: DIF Detections by Software DIFAS 151 Appendix 2: DIF Detections by Software Winsteps 156 Appendix 3: DIF Detections by Software RUMM 164 Appendix 4: Method Agreement of DIF Detections 169

    REFERENCES
    PART I: English References
    Abbott, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7-36.
    Adedoyin, O. O. (2010). Using IRT approach to detect gender biased items in public examinations: A case study from the Botswana junior certificate examination in Mathematics. Educational research and reviews, 5 (7), 385-399.
    AERA (1999). American Educational Research Association, American Psychological Association and National Council on Measurement in Education. Standards for educational and psycho¬logical testing. Washington, DC.
    Alderson, J. C. (2005). Diagnosing Foreign Language Proficiency. London: Continuum.
    Allen, M., & Yen, W. (1979). Introduction to measurement theory. Monterey, CA:
    Brooks/Cole.
    American Psychological Association, American Educational Research Association, &
    National Council on Measurement in Education. (1985). Standards for
    educational and psychological testing. Washington, DC: American Psychological
    Association.
    Anastasi, A. (1988). Psychological testing (6th ed.) New York: Macmillan.
    Anderson, R. C., & Freebody, P. (1981). Vocabulary knowledge. In J. Guthrie (ed.),
    Comprehension and Teaching: Research Reviews (pp. 77-117). Newark, DE: International Reading Association.
    Andrich, D., Sheridan, B. S., & Luo, G. (2005). RUMM2020: Rasch Unidimensional Models for Measurement [computer software]. Perth: RUMM Laboratory.
    Angoff, W. H. (1993). Perspectives on differential item functioning methodology.
    In P. W. Holland & H. Wainer (Eds.), Differential item functioning
    (pp. 3-24). Hillsdale, NJ: Lawrence Erlbaum Associates.
    Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
    Baker, F. (1985). The basics of item response theory. Portsmouth, NH: Heinemann
    Educational Books.
    Baker, D. P. & Stevenson, D. L. (1986). Mother’s strategies for children’s school achievement: Managing the transition to high school. Sociology of Education, 59, 155-166.
    Barnard, H. (1961). Teachers’ book for advanced English vocabulary. Rowley, MA:
    Newbury House.
    Barra, C.(2005) Working with vocabulary. Chile. Retrieved from Internet
    http://www.teachingenglish.org.uk/think/vocabulary/working_with_vocabulary.shtml.
    Bauman, Z. (1988). Freedom. Milton Keynes: Open University Press.
    Beck, U. (1992). Risk Society: Towards a New Modernity. London: Sage.
    Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. Lord & M. R. Novick (Eds.), Statistical theories of mental scores (pp. 395-479). Reading, MA: Addison-Wesley.
    Bourdieu, P. (1977). Cultural Capital and Social Reproduction. Beverly Hills, CA: Sage.
    Bourdieu, P., & Passeron, J. C. (1977). Reproduction. Beverly Hills, CA:Sage.
    Bourdieu, P. (1984). Distinction: a Social Critique of the Judgment of Taste. London: Routledge; Cambridge, Mass: Harvard University Press.
    Bourdieu, P. (1986). The forms of capital. In J. G. Richardson (Ed.), Handbook of theory and research for the sociology of education. New York, NY: Greenwood Press, pp.241-258.
    Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
    Camilli, G. (2006). “Test Fairness.” In R.L. Brennan (Ed.), Educational Measurement (Fourth Edition). American Council on Education. Westport, CT: Praeger Publishers.
    Cardenas, M. S. (2001) Issues on Active Speaking Vocabulary Assessment. Lowa State University. Retrieved from Internet http://www.teachingenglish.org.uk/think/vocabulary/working_with_vocabulary.shtml
    Cariana, R. B., & Lee, D. (2001). The effects of recognition and recall study tasks with feedback in a computer-based vocabulary lesson. Educational Technology Research & Development 49(3), 23-36.
    Chao, Y. C. (2003). Vocabulary abilities needed for a TOEFL-type test of writing. Selected Papers from the Twelfth International Symposium on English Teaching, 173-187. Taipei: Crane Publishing Co.
    Chen, H. J. (1998). A preliminary investigation on Taiwanese EFL learner’s vocabulary size. Proceedings of the Fifteenth Conference on English Teaching and Learning in the Republic of China, 193-211. Taipei: Crane Publishing Co.
    Chen, H. J. (1999). How many words do they know? Assessing Taiwanese college EFL students’ receptive and productive vocabularies. Proceedings of the Sixteenth Conference on English Teaching and Learning in the Republic of China, 83-97. Taipei: Crane Publishing Co.
    Chen, Y.C. & Cheng, Y. N. (2000). The changes of education stratum in Taiwan: examining the adaptation of social capital, cultural capital, and finance capital. Journal of Nation Science Committee Study: Humane Studies and Sociology, 13(3), 416-434.
    Clauser, B. E. & Mazor, K. M. (1998). Using Statistical Procedures to Identify Differentially Functioning Test Items. Educational Measurement: Issues and Practice, 17, 31-44.
    Coleman, J. S. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94, 95-120.
    Cook, L. L., & Eignor, D. R. (1991). IRT equating methods. Educational Measurement: Issues and Practice, 10(3), 37–45.
    Corrigan, P. (1997). The Sociology of Consumption. London: Sage
    Cortazzi, M., & Jin, L. (1996). Changes in learning English vocabulary in China, In H. Coleman & L. Cameron (Eds.), Change and Language (pp.153-165). Clevedon: BAAL/Multilingual Masters.
    Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching and assessment. Cambridge: Cambridge University Press.
    Crocker, L., & Algina, J. (1986). Introduction to Classical and Modern Test Theory. NY: Holt, Rinehart & Winston.
    Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity (p.3-18). Hillsdale, NJ: Lawrence Erlbaum Associates.
    Cumming, J. J. (2008). Legal and educational perspectives of equity in assessment. Assessment in Education: Principles, Policy & Practice, 15, 123-35.
    Davies, A. (1990). Principles of language testing. Oxford: Blackwell.
    Denzin, N. K. (1978). The research act: A theoretical introduction to sociological methods. New York: McGraw-Hill.
    DfEE/QCA (Department for Education and Employment-Qualifications and
    Curriculum Authority). (2000). The national curriculum for England, key stages 1-4. London: The Stationery Office.
    Diamond, L. & Gutlohn, L. (2006) Vocabulary handbook. Berkeley, CA: Consortium on Reading Excellence, Inc. (CORE)
    DiMaggio, P. (1982). Cultural capital and school success: The impact of status culture participation on the grades of U.S. high school students. American Sociological Review, 47, 189-201.
    Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & Wainer (Eds.), Differential Item Function (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.
    Downes, W. (1984). Language and society. London: Fontana paperbacks.
    Dunn, M., & Dunn, L. M. (1981). Peabody Picture Vocabulary Test—Revised. Circle Pines, MN: AGS.
    Eades, D., Fraser, H., Siegel, J., McNamara, T.,& Baker, B. (2003). Linguistic identification in the determination of nationality: A preliminary report. Language Policy, 2(2), 179-199.
    Elder, C. (2014). Book review: The Routledge handbook of language testing. Language testing, 31(1), 138-144.
    Epstein, J. L. (1987). What principals should know about parent involvement. Principal, 66 (3), 6-9.
    Farkas, G. (1996). Human capital or cultural capital? Ethnicity and poverty groups in an urban school district. New York: Aldine de Gruyter.
    Foster, P., Gomn, R., & Hammersley, M. (1996). Constructing educational inequality: An assessment of research on school processes. London; Washington, D. C.: Falmer Press.
    Fulcher, G. (2004). Deluded by artifices? The Common European Framework and harmonization. Language Assessment Quarterly, 1(4), 253–266.
    Giddens, A. (1991). Modernity and Self-Identity. Cambridge: Polity Press.
    Gierl, M. J., Khaliq, S. N. & Boughton, K. (1999). Gender differential item functioning in mathematics and science: Prevalence and policy implications. Paper presented at the Symposium entitled Improving Large-Scale Assessment in Education at the Annual Meeting of the Canadian Society for the Study of Education, Sherbrooke, Quebec, Canada.
    Gierl, M, Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality-based DIF analysis. Journal of Educational Measurement, 40(4), 281-306.
    Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11, 341-363.
    Grosjean, F. (1989). Neurolinguistics, beware! The bilingual is not two monolinguals in one person. Brain and Language, 36, 3-15.
    Guthrie, J. W., Kleindorfer, G. B., Levin, H. M., & Stout, R. T. (1971). Schools and inequality. Cambridge, Massachusetts, and London: MIT Press.
    Hambleton, R. K., Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47.
    Hambleton, R. K., & Rodgers, J. (1995). Item bias review. Practical Assessment, Research & Evaluation, 4(6). Retrieved November 2, 2014 from http://PAREonline.net/getvn.asp?v=4&n=6
    Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and
    applications. Boston, MA: Kluwer Academic Publishers.
    Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991). Fundamentals of item response theory. New York: Sage publications.
    Harris, D. (1989). Comparison of 1-, 2-, and 3-Parameter IRT Models. Educational
    Measurement: Issues and Practice, 8(4), 35-41.
    Hauger, J. B. & Sireci, S. G. (2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8, 237-250.
    Heaton, J. B. (1988). Writing English language tests. London: Longman.
    Hofer, B. K. (2001). Personal epistemology research: Implications for learning and teaching. Journal of Educational Psychology Review, 13, 353-383.
    Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-
    Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.
    Holmes, K. (2009). Planning to teach with digital tools: Introducing the interactive whiteboard to pre-service secondary mathematics teachers. Australasian Journal of Educational Technology, 25(3), 351-365.
    Holt, D. (1998). Does cultural capital structure American consumption? Journal of consumer research, 25(1), 1-25.
    Horwitz, E. K. (1988). The beliefs about language learning of beginning university foreign language students. The Modern Language Journal, 72, 283-294.
    Horwitz, E. K., Horwitz, M. B., & Cope, J. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70, 125-132.
    Hsu, Y. F. (2009, July 19). The urban-rural gap in education. Taipei Times. Retrieved from http://www.taipeitimes.com/News/editorials/archives/2009/07/19/2003449021/1
    Hu, M., & Nation, P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language, 23, 403-430.
    Huang, C. C. (2000). A threshold for vocabulary knowledge on reading comprehension. Proceedings of the Seventeenth Conference on English Teaching and Learning in the Republic of China, 132-144. Taipei: Crane Publishing Co.
    Huang, Y. T. (2015). Participatory design to enhance ICT learning and community attachment: a case study in rural Taiwan. Future Internet, 7, 50-66.
    Hubbard, P. et al. (1983) A training course for TEFL. Oxford: Oxford University Press.
    Hughes, A. (2003). Testing for language teachers. New York: Cambridge University Press.
    Johns, R. L. & Morphet, E. L. (1975). The economics and financing of education. London: Pretice-Hall.
    Joint Committee on Testing Practices. (1988). Code of fair testing practices in education. Washington, DC: American Psychological Association.
    Jonassen, D., & Tessmer, M. (1996). An outcome-based taxonomy for instructional systems design, evaluation and research. Training Research Journal, 2, 11-46.
    Joshi, R. M. (2005). Vocabulary: A critical component of comprehension. Reading & Writing Quarterly, 21(3), 209-219.
    Kamata, A., & Vaughn, B. K. (2004). An introduction to differential item functioning analysis. Learning Disabilities: A Contemporary Journal, 2, 49-69.
    Kane, M. (2010). Validity and fairness. Language testing, 27(2), 177-182.
    Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167-178.
    Khoii, R. & Shamsi, N. (2012). A fairness issue: test methods facet and the validity of grammar subtests of high-stakes admission tests. Literacy information and computer education journal, special issue, 1 (1), 801-809.
    Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1), 89-114.
    Kline, P. (2000). A psychometrics primer. London, UK: Free Association Books.
    Krashen, S. D. & Terrel, T. D. (1983). The natural approach. Englewood Cliffs, NJ: Prentice Hall Regents.
    Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp.1-14). Cambridge, UK: Cambridge University Press.
    Lai, H. Y. (2011). Taiwanese university students’ perceptions of learning English as an international language. International journal of humanities, 9(1), 197-205.
    Lareau, A. (1987). Social class differences in family-school relationships: The importance of cultural capital. Sociology of Education, 60, 73-85.
    Lareau, A. & McNamara Horvat, E. (1999). Moments of social inclusion and exclusion race, class and cultural capital in family-school relationships. American Sociological Association, 72 (1), 37.
    Laufer, B. (1992). How much lexis is necessary for reading comprehension? In H. Bejoint & P. Arnaud (Eds.), Vocabulary and applied linguistics (pp.126-132). Basingstoke & London: Macmillan.
    Lee, J. & Schallert, D. L. (1997). The relative contribution of L2 language proficiency and L1 reading ability to L2 reading performance: A test of the threshold hypothesis in an EFL context. TESOL Quarterly, 30(4), 713-739.
    Lee, Y. W. (2000). Identifying suspect item bundles for the detection of differential bundle functioning in an EFL reading comprehension test: A preliminary study. In Kunnan (Ed.), Studies in Language Testing: Fairness and validation in language assessment: selected papers from the 19th language testing research colloquium, Orlando, Florida. Cambridge, UK: Cambridge University Press.
    Linacre, J.M. (2014). Winsteps® (Version 3.81.0) [Computer Software]. Beaverton, Oregon: Winsteps.com. Retrieved January 1, 2014. Available from http://www.winsteps.com/
    Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
    Lumley, T., & O’Sullivan, B. (2005). The effect of test-taker gender, audience and topic on task performance in tape- mediated assessment of speaking. Language Testing, 22(4), 415–437.
    Lu, L. & Liu, H. C. (1998). Chinese Version of Peabody Picture Vocabulary Test- Revised Manual. Psychological Publishing Co., Ltd.
    Lu, S. M. (1999). An overview of procedures for identifying Differential Item Functioning. Taipei municipal teachers college academic journal, 30, 149-166.
    Macedo, D. (2000). The colonialism of the English only movement. Educational Researcher, 20 (3), 15-24.
    MacGinitie, W., MacGinitie, R., Maria, K., & Dreyer, L. (2002). Gates MacGinitie reading tests (4th ed.). Itasca, IL: Riverside.
    Madaus, G. F., & Kellaghan, T. (1992). Curriculum evaluation and assessment. In P. W. Jackson (Ed.), Handbook on research on curriculum (pp. 119–154). New York: Macmillan.
    Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from
    retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
    McKeown, M. G., L. (2002). Bringing words to life: Robust vocabulary instruction. New York, NY: Guilford
    McNamara, T. F. (2001). Language assessment as social practice: challenges for research. Language testing, 18(4), 333-349.
    McNamara, T. F. & Roever, C. (2006). Language testing: the social dimension. Malden, MA: Blackwell Publishing.
    McNamara, T. F. & Ryan, K. (2011). Fairness versus Justice in Language Testing: The Place of English Literacy in the Australian Citizenship Test. Language Assessment Quarterly, 8, 161-178.
    Meara, P. (1996). The dimensions of lexical competence. In Brown, C. Malmkjaer, K. Williams, M. (eds), Performance and competence in second language acquisition. Cambridge: Cambridge University Press, 33-53.
    Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118.
    Messick, S. (1989). Validity. In Linn, R. L. (Ed.) Educational Measurement (3rd ed.) New York NY: American Council on Education/ MacMillan Series on Higher Education: 13-104.
    Messick, S. (1994). The interplay of evidence and consequences in the validation of
    performance assessments. Education Researcher, 32(2), 13-23.
    Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.
    Messick, S. (1996a). Standards-based score interpretation: Establishing valid grounds for valid inferences. Proceedings of the joint conference on standard setting for large scale assessments, Sponsored by National Assessment Governing Board and The National Center for Education Statistics. Washington, DC: Government Printing Office.
    Messick, S. (1996b). Validity of Performance Assessment. In Philips, G. (1996). Technical Issues in Large-Scale Performance Assessment. Washington, DC: National Center for Educational Statistics.
    Milanovic, M. (1999) (Ed.) Studies in language testing 7: Dictionary of Language Testing. UK: Cambridge.
    Miller, D., Jackson, P., Thrift, N., Holbrook, B. and Rowlands, M. (1998). Shopping, Place and Identity. London: Routledge.
    Ministry of Education (2005). Challenge 2008: National Development Plan. Taiwan: Ministry of Education.
    Ministry of Education (2005). The Six-Year National Plan: Project on Cultivation of Talent for the E-generation.(挑戰2008國家發展重點計畫: E世代人才培育計畫) Taiwan: Ministry of Education.
    Ministry of Education (2006). Guidelines and requirements of Grades 1-9 curriculum for Elementary and Junior High School Education- Language Area (English). Taiwan: Ministry of Education.
    Nagy, W. E., & Anderson, R.C. (1984). How many words are there in printed school English? Reading Research Quarterly, 19, 304–330.
    Nation, I. S. P. (1990). Teaching and learning vocabulary. New York: Heinle & Heinle.
    Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
    Nation, I. S. P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition, and pedagogy (pp.6-19). Cambridge: Cambridge University Press.
    Nurweni, A., & Read, J. (1999). The English vocabulary knowledge of Indonesian university students. English for Specific Purposes, 18, 161-175.
    Ordonez, C. L., Carlo, M. S., Snow. C. E. & Mclaughiin, B. (2002). Depth and breadth of vocabulary knowledge in two languages: Which vocabulary skills transfer? Journal of Educational Psychology, 94, 719-728.
    Pae, T. (2004). DIF for examinees with different academic backgrounds. Language Testing, 21(1), 53-73.a
    Penfield, R. D. (2005). DIFAS: Differential item functioning analysis system. Applied Psychological Measurement, 29, 150-151.
    Perrone, M. (2006). Differential item functioning and item bias: critical consideration in test fairness. Applied linguistics, 6 (2), 1-3.
    Pine, S. M. & Weiss, D. J. (1978). A comparison of the fairness of adaptive and conventional testing strategies. (Research Report 78-1). University of MN, Psychometric Methods Program.
    Qian, D. (1998). Depth of vocabulary knowledge: Assessing its role in adults’ reading comprehension in English as a second language. Unpublished doctoral dissertation, University of Toronto.
    Qian, D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension. Canadian Modern Language Review, 56, 282-308.
    Quinn, G. (1968). The English vocabulary knowledge of some Indonesian university entrants. English Department Monograph IKIP Kristen Satya Watjana: Salatiga.
    Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.
    Danmarks Paedagogiske Institut, Copenhagen.
    Read, J. (1993). The development of a new measure of L2 vocabulary knowledge. Language Testing, 10, 355-371.
    Read, J. (1997). Vocabulary and testing. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition and pedagogy (pp.303-320). Cambridge: England: Cambridge University Press.
    Read, J. (2000). Assessing vocabulary. Cambridge, England: Cambridge University Press.
    Reath, A. (2004). Language analysis in the context of the asylum process: Procedures, validity, and consequences. Language Assessment Quarterly, 1(4), 209-233.
    Reckase, D. (1979). Unifactor latent trait models applied to multifactor tests: results and implications. Journal of Educational Statistics 4(3), 207-230.
    Richards, P., & Leonor, M. (1981). Education and income distribution in Asia: a study prepared for the International Labour Office within the framework of the World Employment Programme. London: Croom Helm.
    Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 417-458.
    Roever, C. (2005). “That’s not fair!” Fairness, bias, and differential item functioning in language testing. Retrieved from http://www2.hawaii.edu/~roever/brownbag.pdf
    Rogers, H. J. & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105-116.
    Rubinstein, D. (1993). Opportunity and structural sociology. Journal for the Social Behavior, 2(3), 266-283.
    Ryan, K. & Bachman, L. F. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9, 12-29.
    Ryan, K. & Shepard, L. (2008). The future of test-based educational accountability. Oxon: Routledge.
    Scheerens, J. (2004). Perspectives on education quality, education indicators and benchmarking. European Educational Research Journal, 3, 115-138.
    Scheuneman, J.D. (1982). A new look at bias in aptitude tests. In P. Merrifield (Ed.), New directions for testing and measurement: Measuring human abilities, No. 12. San Francisco: Jossey-Bass.
    Schmitt, N. (2000). Vocabulary in Language Teaching. Cambridge: Cambridge University Press.
    Schmitt, N. (2010). Researching Vocabulary: A vocabulary research manual. Basingstoke, U.K.: Palgrave Macmillan.
    Shepard, L., Camilli, G. and Averill, M. (1981) Comparison of procedures for detecting test item bias with both internal and external ability criteria. Journal of Educational Statistics 6(4): 317-375.
    Shohamy, E. (2000). Fairness in language testing. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp.15-19). Cambridge, UK: Cambridge University Press.
    South Australia. (2002). South Australian curriculum, standards and accountability framework: English as a second language scope and scales. Adelaide: Department of Education, Training and Employment.
    Southerton, D. (2001) ‘Ordinary and Distinctive Consumption; or a Kitchen Is a Kitchen Is a Kitchen’, in J. Gronow and A. Warde (eds) Ordinary Consumption, pp.159–178. London: Routledge.
    Spolsky, B. (1977). Language testing: Art or science? Proceedings of the Fourth International Congress of Applied Linguistics, 7-28. Stuttgart: Hochschulverlag.
    Stahl, S.A. (1986). Three principles of effective vocabulary instruction. Journal of Reading, 29, 662-669.
    Stanovich, K.E. (2004). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21, 360-406.
    Stinnett, T. A., Havey, J. M., & Oehler-Stinnett, J. (1994). Current test usage by
    practicing school psychologists: A national survey. Journal of Psychoeducational Assessment, 12, 331-350.
    Subkoviak, M., Mack, J., Ironson, G., & Craig, R. (1984). Empirical comparison of selected item bias detection procedures with bias manipulation. Journal of Educational Measurement, 21, 49-58.
    Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
    Taipei City News. (2013, December 11). Taipei to offer kids free milk once a week. Department of Information Technology, Taipei City Government. Retrieved from http://english.gov.taipei/ct.asp?xItem=70196044&ctNode=8472&mp=100002
    Taiwan Ministry of Education (2009). Construction of e-learning environments in every county and city. [viewed 15 Feb, 2010].
    Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2
    vocabulary test. Language Testing, 17(3), 323-340.
    Tannenbaum, K. R., Torgesen, J. K., & Wagner, R. K. (2006). Relationships between word knowledge and reading comprehension in third-grade children. Scientific Studies of Reading, 10(4), 381-398.
    Teachers of English to Speakers of Other Languages (TESOL). (1998). The ESL standards for pre-K-12 students. Alexandria, VA: Author.
    The China Post. (2013, December 11). Taipei to offer free milk in elementary schools. The China Post. Retrieved from http://www.chinapost.com.tw/taiwan/local/taipei/2013/12/11/395707/Taipei-to.htm
    Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item
    functioning using the parameters of item response models. In P. W. Holland & H.
    Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum Associates.
    Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum
    Associates.
    Torff, B. & Tirotta, R. (2010). Interactive whiteboards produce small gains in elementary students’ self-reported motivation in mathematics. Computers & Education, 54, 379-383.
    Toronto Board of Education. (1988). Benchmarks: Standards of student achievement. Toronto, Ontario: Toronto Board of Education.
    Verma, G. K. (1993). Inequality and teacher education: An international perspective. London: The Falmer Press.
    Wainer, H. (Ed.). (2000a). Computerized adaptive testing: A primer (2nd Edition).
    Hillsdale, NJ: Lawrence Erlbaum Associates.
    Wainer, H., Sireci, S. G., Thissen, D. (1991). Differential testlet functioning:
    definitions and detection. Journal of Educational Measurement, 28, 197-219.
    Walberg, H. J. (1997). U.S, Schools teach reading least productively. Research in English, 30(3), 328-434.
    Webb, M. L., Cohen, A. S., & Schwanenflugel, P. J. (2008). Latent class analysis of differential item functioning on the Peabody Picture Vocabulary Test-III. Educational and psychological measurement, 68(2), 335-351.
    Wesche, M., & Paribakht, T.S. (1996). Assessing second language vocabulary knowledge: Depth versus breadth. Canadian Modern Language Review, 53, 13-40.
    West, M. (1953). A General Service List of English Words, London: Longman, Green and company.
    Westerlund, M. & Lagerberg, D. (2008). Expressive vocabulary in 18-month-old children in relation to demographic factors, mother and child characteristics, communication style and shared reading. Child: care, health and development, 34(2), 257-266.
    Widdowson, H.G.(1989). Knowledge of language and ability for use. Applied linguistics, 10(2), 128-137.
    Williams, V. (1997). The “unbiased” anchor: Bridging the gap between DIF and item bias. Applied Measurement in Education, 10, 253-267.
    Xin, Y. (2013, January 24). Taiwanese schools; imbalance between city and rural children. Asia News. Retrieved from http://www.asianews.it/news-en/Taiwanese-schools:-imbalance-between-city-and-rural-children-26950.html
    Zeidner, M. (1986). Are English language aptitude tests biased towards culturally different minority groups? Some Israeli findings. Language Testing, 3, 80-95.
    Zeidner, M. (1987). A comparison of ethnic, sex, and age bias in the predictive validity of English Language aptitude tests: Some Israeli data. Language Testing, 4, 55–71.
    Zieky, M. (1993). Practical questions in the use of DIF statistics in item development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-364). Hillsdale, NJ: Lawrence Erlbaum.
    Zimmerman, C. B. (1997). Do reading and interactive vocabulary instruction make a difference? An empirical study. TESOL Quarterly, 31(1), 121-140.
    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-like (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.
    Zumbo, B. D. (2007). Three generation of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223- 233.
    Zwick, R., Thayer, D.T., & Lewis, C. (1999) An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis. Journal of Educational Measurement, 36, 1, 1-28.

    PART II: Chinese References
    潘明智 (民99)。 國中生基測成績城鄉差距之研究—以台南市為例。台南:立德大學碩士論文。
    黃怡雯 (民97)。 偏遠地區國中學生基測成績之探討。學校行政雙月刊,58,60-75。
    行政院主計處與行政院青年輔導委員會,「中華民國台灣地區青少年狀況調查報告」,頁9-10 (2009).
    行政院研究發展考核委員會 (2010)。99年數位落差調查報告,台北: 行政院研究發展考核委員會。
    張武昌 (民95)。台灣的英語教育:現況與省思。教育資料與研究雙月刊,69,129-144。
    張武昌 (民96)。國中基本學力測驗英語科雙峰現象形成原因之探討。飛揚,16。取自http://www.bctest.ntnu.edu.tw/
    鄭恆雄 (民94)。大學入學考試中心《高中英文參考詞彙表》之編輯方法及原則。教育研究月刊(10)。5-17。
    鄭語謙 (民104)。鄉下更貴?國小上網費也有城鄉差距。聯合晚報。民104年4月16日,取自: http://udn.com/news/story/6885/840936
    陳智華 (民104)。偏鄉中小學,網路龜速常卡卡。聯合報。民104年4月17日,取自: http://udn.com/news/story/6885/842071

    下載圖示
    QR CODE