簡易檢索 / 詳目顯示

研究生: 林小慧
Lin, Hsiao-Hui
論文名稱: 科學建構反應評量之發展與研究
The Development of Scientific Constructed-Response Assessments
指導教授: 林世華
Lin, Sieh-Hwa
學位類別: 博士
Doctor
系所名稱: 教育心理與輔導學系
Department of Educational Psychology and Counseling
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 163
中文關鍵詞: 科學建構反應評量評分者一致性RSM 與PCM 模式比較多面向Rasch 測量模式驗證性因素分析
英文關鍵詞: Scientific Constructed-Response Assessments, rater consistency, the model comparison of RSM & PCM, many-facet Rasch measurement, confirmatory factor analysis
DOI URL: https://doi.org/10.6345/NTNU202203854
論文種類: 學術論文
相關次數: 點閱:142下載:46
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究主要在發展「科學建構反應評量」,並發展評閱科學能力之「科學建構反應評量規準」。全評量包含「科學知識的記憶與了解」、「科學程序的應用與分析」、「科學邏輯的論證與表達」以及「問題解決的評估與創造」四個分評量,共計32 題建構題。研究者透過項目分析、建構效度及信度的檢驗,分析處理實徵資料,以檢視評量的信效度。分析結果顯示,評分者內之一致性值均> .9 ,可見評分者內的一致性相當穩定。其次,評分者間之Kendall ω 和諧係數值 > .9 ,P 值< .05,達顯著相關,顯示評分者間的評分結果相當一致。再者,評分者嚴苛度之多面向M卡方考驗未達顯著水準 (χ2 = 5.01,df = 3, p = .171 ),分散指標信度 (separation reliability) 為.57 ,表示評分者間具有一致性,與古典測驗理論的分析結果一致。另RSM 及PCM 模式比較之卡方考驗則達顯著水準,表示評分閾值 (threshold) 存在差異,未達理想水準。然將RSM 與PCM 所估計出來的Deviance 進行BIC 的轉換,結果顯示RSM 較為適配,顯示評分者間有相同的評分閾值。基此,後續仍應持續蒐集資料,進一步確認評分者閾值嚴苛度的一致性。此外,題本之內部一致性,均 > .8 ,全評量α 則在.90 以上,顯示SCRA 之Cronbach’s α 呈現相當不錯的範圍。最後,根據CFA 分析結果顯示,實徵資料尚且支持「科學建構反應評量」理論概念模式。「科學知識的記憶與了解」、「科學程序的應用與分析」、「科學邏輯的論證與表達」以及「問題解決的評估與創造」所檢測四個一階潛在因素,可被二階因素之「科學能力」解釋的百分比分別為.92 、.56 、.46 、.46。

    This study aims to advance the Scientific Constructed-Response Assessments (SCRA), with a focus on the Rubric of Scientific Constructed-Response Assessments (RSCRA) designed to evaluate the extent of scientific comprehension. To this end, the optics is just the scientific unit of the assessment, consisting of 32 open-ended items categorized into 4 subscales: the memory and understanding of scientific knowledge,the application and analysis of scientific procedure,the demonstration and expression of scientific logic, and the evaluation and creating of problem solving. Item analysis, the Cronbach’s α of the intra-rater was bigger than .9, and showed the intra-rater consistency good. The Kendall coefficient of concordance of the inter-rater reliability was bigger than .9, the value of P was smaller than .5, and showed the inter-rater consistency good. Another, The analysis of many-facet Rasch measurement (MFRM) shows that the chi- square test of rater sevirity was not significant. This means the same of inter-rater consistency and the results are consistent with the classical test theory. And the comparison of the rating scale model (RSM) and the partial credit model (PCM) shows that the chi- square test of rater sevirity was significant.This means the difference of inter-rater threshold. However, the Bayesian Information Criterions (BIC) of RSM and PCM shows that the RSM was goodness-of-fit, and the inter-rater threshold can be considered the same. Therefore, we should continue to collect informations to confirm the consistency of inter-rater threshold. Furthermore, the Cronbach’s α of the items were bigger than .8 and also were within acceptable range. Finaly, Second-order confirmatory factor analysis shows that there was an acceptable goodness-of-fit among the SCRA. The SCRA accounted for .92, .56, .46, and .46 of the variance associated with the first test of 4 subscale.

    中文摘要 i 英文摘要 ii 目次 iv 表次 vi 圖次 viii 第一章 緒論 1 第一節 研究動機 1 第二節 研究目的 4 第三節 研究問題 4 第四節 研究假設 4 第五節 名詞解釋 5 第六節 研究限制 5 第二章 文獻探討 7 第一節 科學建構反應之基礎研究 7 第二節 科學建構反應評量之基礎研究 9 第三節 多面向Rasch 測量模式 12 第三章 研究方法 17 第一節 研究架構 17 第二節 研究對象 18 第三節 研究工具 20 第四節 研究步驟 37 第四章 研究結果 45 第一節 預試分析結果 45 第二節 正式評量分析結果 49 第五章 研究討論 61 第一節 綜合討論 62 第二節 研究應用 67 第三節 研究建議 69 參考文獻 71 附錄一 科學建構反應評量 76 附錄二 「科學建構反應評量」評閱範例 90 進階範例 90 精熟範例 104 基礎範例 118 待加強範例 132 附錄三 科學建構反應評閱規準 147

    林世華、陳學志、盧雪梅(2004)。國民中小學九年一貫課程學習成就評量指標與方法手冊。台北:教育部。
    郭生玉 (2004)。教育測驗與評量。 新北市: 精華書局。
    Anderson, L. W. (1999). Rethinking Bloom's Taxonomy: Implications for Testing and Assessment.
    Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: a revision of Bloom's taxonomy of educational objectives. New York: Longman.
    Bennett, R. E., & Ward, W. C. (1993). Construction Versus Choice in Cognitive Measurement: Issues in Constructed Response, Performance Testing, and Portfolio Assessment: L. Erlbaum Associates.
    Bloom, B. S. (1956). Taxonomy of educational objectives; the classification of educational goals. New York: Longmans, Green.
    Breger, D. C. (1995). The inquiry paper. Science Scope, 19(2), 27-32.
    Calkins, L. M. C. (1994). The art of teaching writing: Heinemann.
    Carter, P. L., Ogle, P. K., & Royer, L. B. (1993). Learning logs:What are they and how do we use them? In N. L. Webb & A. F. Coxford (Eds.), Assessment in the mathematics classroom (pp.87-96). Reston, VA:National Council of Teachers of Mathematics.
    Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests: SAGE Publications Ltd.
    Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221.
    Eckes, T. (2009). Many-facet Rasch measurement. Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment.
    Ediger, M. (1998). Writing and the Pupil in the Science Curriculum. (ERIC Document Reproduction Service No. ED 426 846).
    Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. Paper presented at the World Conference on Educational Multimedia, Hypermedia and Telecommunications.
    Foster, G. (1984). Technical writing and science writing. Is there a difference and what does it matter? . Paper presented at the Annual Meeting of the Conference on College Composition and Communication 35th, New York City, USA, 29-31.
    Gronlund, N. E. (1985). Measurement and evaluation in teaching. New York: Macmillan.
    Horton, P. B., Fronk, R. H., & Walton, R. W. (1985). The effect of writing assignments on achievement in college general chemistry. Journal of Research in Science Teaching, 22(6), 535-541.
    Huang, Y. C. (1999). A study of reformulation relations in scientific reports. (Unpublished master’ s thesis), University of Tsing Hua, Hsinchu, Taiwan, ROC.
    Kintsch, E., Steinhart, D., Stahl, G., LSA Research Group, L. R. G., Matthews, C., & Lamb, R. (2000). Developing summarization skills through the use of LSA-based feedback. Interactive Learning Environments, 8(2), 87-109.
    Kirkpatrick, L. D., & Pittendrigh, A. S. (1984). A writing teacher in the physics classroom. The Physics Teacher, 22(3), 159-164. doi: doi:http://dx.doi.org/10.1119/1.2341502
    Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43. doi: http://dx.doi.org/10.1016/j.asw.2007.04.001
    Landy, F. J., & Farr, J. L. (1983). The Measurement of Work Performance: Methods, Theory, and Applications. New York, NY: Academic Press.
    Langer, J. A., & Applebee, A. N. (1987). How Writing Shapes Thinking: A Study of Teaching and Learning(NCTE Research Report No. 22). Urbana, Illinois: National Council of Teachers of English.
    León, J. A., Olmos, R., Escudero, I., Cañas, J. J., & Salmerón, L. (2006). Assessing short summaries with human judgments procedure and latent semantic analysis in narrative and expository texts. Behavior research methods, 38(4), 616-627.
    Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
    Miller, R. G., & Calfee, R. C. (2004). Building a better reading-writing assessment: Bridging cognitive theory, instruction, and assessment. English Leadership Quarterly, 26(3), 6-13.
    Newell, G. E. (1984). Learning from Writing in Two Content Areas: A Case Study/protocol Analysis. Research in the Teaching of English, 18(3), 265-287.
    Newell, G. E. (1986). Learning from writing: Examining our assumptions. English Quarterly, 19, 291-302.
    Palinscar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. Cognition and instruction, 1(2), 117-175.
    Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests.
    Rivard, L. O. P. (1994). A review of writing to learn in science: Implications for practice and research. Journal of Research in Science Teaching, 31(9), 969-983.
    Roid, G. H. (1994). Patterns of writing skills derived from cluster analysis of direct-writing assessments. Applied Measurement in Education, 7(2), 159-170.
    Rowell, P. M. (1997). Learning in school science: The promises and practices of writing. Studies in Science Education, 30(1), 19-56.
    Schwarz, G. (1978). Estimating the Dimension of a Model. 461-464. doi: 10.1214/aos/1176344136
    Sensenbaugh, R. (1989). ERIC/RCS: Writing Across the Curriculum: Evolving Reform. Journal of Reading, 32, 462-465.
    Slotta, J. D., Chi, M. T., & Joram, E. (1995). Assessing students' misclassifications of physics concepts: An ontological basis for conceptual change. Cognition and instruction, 13(3), 373-400.
    Stepanek, J. S. (1997). Assessment strategies to inform science and mathematics instruction [microform] : it's just good teaching / [Jennifer Stepanek, Denise Jarrett]. [Portland, OR] : [Washington, DC]: Northwest Regional Educational Laboratory ; U.S. Dept. of Education, Office of Educational Research and Improvement, Educational Resources Information Center.
    Thall, E., & Bays, G. (1989). Utilizing ungraded writing in the chemistry classroom. Journal of Chemical Education, 66, 662-663.
    Toranj, S., & Ansari, D. N. (2012). Automated Versus Human Essay Scoring: A Comparative Study. Theory & Practice in Language Studies, 2(4), 719-725. doi: 0.4304/tpls.2.4.719-725
    Valenti, S., Cucchiarelli, A., & Panti, M. (2002). Computer based assessment systems evaluation via the ISO9126 quality model. Journal of Information Technology Education: Research, 1(1), 157-175.
    Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education: Research, 2(1), 319-330.
    VanDeWeghe, R. (1987). Making and remaking meaning: Developing literary responses through purposeful informal writing. English Quarterly, 20, 38-51.
    Witkin, S. L. (2000). Writing social work. Social Work, 45(5), 389-394.

    下載圖示
    QR CODE