跳到主要內容

臺灣博碩士論文加值系統

(3.95.131.146) 您好!臺灣時間:2021/07/25 13:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳怡琴
研究生(外文):Yi-chin Chen
論文名稱:以Rasch模式探討性別及學習機會對數學分數實作評量「差異試題功能」(DIF)之影響
論文名稱(外文):Investigating Gender and Opportunity to Learn DIF of A Mathematics Performance Assessment Using Rasch Model
指導教授:張麗麗張麗麗引用關係
指導教授(外文):Li-ly Chang
學位類別:碩士
校院名稱:國立屏東教育大學
系所名稱:教育心理與輔導學系
學門:社會及行為科學學門
學類:心理學類
論文種類:學術論文
畢業學年度:97
語文別:中文
論文頁數:201
中文關鍵詞:Rasch模式數學分數實作評量數學分數日誌試題構念效度計分規準構念後果效度性別差異試題功能學習機會差異試題功能
外文關鍵詞:Key words: mathematics performance assessmentsRasch modelvaliditygender DIFopportunity to learn DIF
相關次數:
  • 被引用被引用:9
  • 點閱點閱:1020
  • 評分評分:
  • 下載下載:247
  • 收藏至我的研究室書目清單書目收藏:4
以Rasch模式探討性別及學習機會對數學分數實作評量
「差異試題功能」(DIF)之影響
摘 要
本研究以Rasch模式檢視自編數學分數實作評量之性別及學習機會差異試題功能(DIF);在檢視前,研究者先對效度,尤其是內在結構及計分規準結構提出效度憑證。性別DIF以驗證取向方式進行,先根據文獻進行實質分析,提出假設後進行驗證;學習機會DIF則以準實驗研究方式進行,以研究者任教學校六年級三個班69名學生及類似學區兩校六年級三班82名學生為研究對象,前者為實驗組,在分數概念上接受為期6個月的實作評量及日誌教學;後者為控制組,接受傳統教學及評量。兩組均接受「數學分數實作評量前、後測」,並對部分學生進行「數學分數實作評量前、後測的結構性訪談」。
在內容的效度憑證方面,研究者除了依雙向細目表編製試題、並請一位測驗專家及一位數學領域專家判斷雙向細目表與試題之間的適合程度,進行試題修改及預試。同時,研究者也以「受試者與試題對應圖」進行檢視,結果顯示試題符合認知層次以及分數概念的發展歷程。整體言,測驗內容和擬測內容間具關聯性,且具代表性。
在實質歷程效度憑證方面,本研究採結構性訪談的方式探討學生是否實際從事數學解題的歷程。結果顯示,數學實作評量試題的確能測到學生數學問題解決的歷程,且學生也實際從事這樣的解題歷程。
在內部結構的效度憑證方面,Rasch模式的適合度指標顯示測驗大致呈現單向度構念。此外,試題與受試者對應圖也顯示兩者互相對應,能有效區辨受試者能力和試題難度。
在計分規準的構念憑證方面,研究者透過編製過程、評分者訓練,以及Rasch多元計分模式檢視量表等級的次序階層關係,結果顯示,計分規準(在主要及次要構念釐清後及規準修正後)能適切反映單向度構念且能有效區辨不同能力的受試者。同時,嚴謹的評分者訓練,也確保兩位評分者能精確一致的使用計分規準從事計分工作。
在性別差異試題功能方面,以Rasch模式的殘差雙因子變異數分析進行檢視,結果發現,溝通試題對女生較有利,並於後測形成DIF,其他試題雖未如預期形成性別DIF,但ICC圖形顯現出共同特徵:
一、圖形的試題對男生較有利(前、後測皆顯示)。
二、算則形式的試題對女生較有利(前、後測皆顯示)。
三、倍數關係的試題對男生較有利(前測)。
四、有大量文字理解的試題對女生較有利(前測)。
在學習機會差異試題功能上,以Rasch模式的殘差雙因子變異數分析進行檢視,在後測九題實作題中,有五題呈現預期的教學機會DIF,結果如下:
一、四題對實驗組有利,其題目所測特質為:數學溝通題(2題)、邏輯比例關係(1題)、圖解策略(1題)。
二、 一題對控制組有利,測例行性繁複算則(1題)。
根據結果,提出分析,發現無法形成性別及學習機會DIF原因如下:(1)試題難度過高;(2)實作評量所測特質較複雜,以致試題特徵易混淆;(3)題目與教科書題目相近,導致原本的非例行性問題變為例行性問題;(3)實作題不夠複雜,導致學生圖示算則,而非以圖示解題。
同時研究者也發現,評量實驗(學習機會)對性別DIF的影響,是削弱或增強仍無法釐清,需後續研究進行相關分析。
最後,研究者根據結果對教學評量及未來研究提出討論和建議。
Investigating Gender and Opportunity to Learn DIF of A Mathematics Performance Assessment Using Rasch Model
Abstract
The main purpose of this study was to investigate gender and OTL (opportunity to learn) DIF of a mathematics performance assessment (PA) by using Rasch model. Validity evidences of the internal structural, including the structural of the scoring rubrics, were collected before the DIF investigations. A confirmatory approach was adopted in studying the gender DIF (i.e., gender DIF hypotheses were formulated based on the literature before the hypotheses were tested). A quasi-experimental study was implemented for examining the OTL DIF. A total of 151 sixth graders participated in the six-month experiment, with three classes (n=69) receiving mathematics journal writing and mathematics PA and the other three (n=82) receiving the “drill-and-practice” traditional mathematics instruction and assessments. All students received mathematics pre- and post-tests (including multiple-choice and performance test items); in addition, a small number of students in the experimental group received structured interviews in regard to the problem solving process.

The results of this study are summarized as follows:
(1) Content-related validity evidence: With expert judgments and empirical evidence (using Rasch “person-item map” to validate the hierarchical levels of the cognitive levels specified in the two-way test specification table) supported the adequacy and representativeness of the mathematics fractional test.
(2) Substantial-related validity evidence: Data collected through structured interviews supported that the students were engaged in the processes of problem solving.
(3) Internal structural validity evidence: Rasch fit indices indicated that the overall test as well as the individual items were conformed to a unidimensional scale.
(4) Structural of scoring rubrics: The levels of the scoring rubrics were in hierarchical order after the primary and auxiliary dimensions of the test items were clarified and the rubrics were revised. Moreover, the rater consistency was high.
(5) Gender DIF: Although few significant gender DIF items, the findings obtained from two-way residual analysis (Rasch analysis) were in general consistent with the literature (i.e., graphical and logical items were in favor of boys, while computation and verbal/communication items were in favor of girls). The effects of OTL on gender DIF were still not clear and needs further research.
(6) OTL DIF: Out of the nine PA items, five displayed OTL DIF as expected. The possible reasons for the unexpected non-DIF items include: item difficulty levels were too high, the auxiliary dimension intended to be measured by the PA items was confounded with other traits, the similarity of test items and examples in textbook turned non-routine items into routine items, and the graphical representations given by students may be based on the results obtained from routine computations rather than the manifestations of using graphical representation to solve PA problems.

Based on the results of this study, discussion and suggestions regarding the implementation of PA and DIF studies are provided.
目 錄
謝辭………………………………………………………………………………………Ⅰ
中文摘要…………………………………………………………………………………Ⅱ
英文摘要…………………………………………………………………………………Ⅳ
目錄………………………………………………………………………………………Ⅵ
表次………………………………………………………………………………………Ⅷ
圖次………………………………………………………………………………………Ⅹ
附錄次……………………………………………………………………………………XI
第一章 緒論……………………………………………………………………………1
第一節 研究背景與動機……………………………………………………… 1
第二節 研究目的與問題……………………………………………………… 9
第三節 名詞釋義……………………………………………………………… 10
第四節 研究限制……………………………………………………………… 12

第二章 文獻探討……………………………………………………………………… 14
第一節 實作評量的背景和特質………………………………………………14
第二節 數學日誌之應用………………………………………………20
第三節 DIF的概論…………………………………………………… 23
第四節 實作評量之DIF ……………………………………………… 30
第五節 Rasch模式檢視性別和教學層面的實徵研究……………………37

第三章 研究方法……………………………………………………………………… 40
第一節 研究假設……………………………………………………………… 40
第二節 研究對象……………………………………………………………… 41
第三節 研究設計……………………………………………………………… 43
第四節 研究工具……………………………………………………………… 45
第五節 研究程序……………………………………………………………… 60
第六節 資料分析方法………………………………………………………… 62

第四章 研究結果與討論…………………………………………………………… 69
第一節 數學實作評量之構念結構…………………………………………… 69
第二節 數學實作評量計分規準之構念結構………………………………… 80
第三節 數學實作評量性別差異試題功能…………………………………… 92
第四節 數學實作評量在學習機會的差異試題功能…………………………114

第五章 結論與建議…………………………………………………………………125
第一節 結論……………………………………………………………………125
第二節 討論……………………………………………………………………128
第三節 建議……………………………………………………………………134
參考書目…………………………………………………………………………………136
附錄………………………………………………………………………………………145
參 考 書 目
一、英文部分
Andrich, D. (1988). Rasch models for measurement. Newbery Park, CA: SAGE.
Andrich, D. & Hagquist, C. (2004). Detecting of differential item functioning using analysis of variance. Paper presented at the Second International Conference on Measurement in Health, Education, psychology and Marketing: Developments with Rasch Models. Murdoch University, Perth.
American Educational Research Association, American Psychological Association & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Angoff, W. H.(1982). Use of difficulty and discrimination indices for detecting item bias. In R. A. Berk (Ed.), Handbook of methods for detecting item bias. (pp. 96-116). Baltimore: Johns Hopkins University Press.
Angoff, W. H.(1993). Perspectives on differential item funcitioning methodology. In Holland, P. W. &Wainer, H. (Ed.), Differential Item Functioning(pp. 3-23.) Hilldale, NJ: Lawrence Erlbaum Association, Publishers.
Benhow, C. P., & Stanley, J. C. (1982). Consequences in high school and college of sex differences in mathematical reasoning ability: A longitudinal perspective. American Educational Research Journal, 19, 598-622.
Baker, D., & Jocobs, K. (1999). Winners and loser in single-sex science and mathematics classrooms. Paper presented at the Annual Meeting of the National Association for Research in Science Teaching. (Boston, MA, March, 28-31, 1999)
Bond, L. (1995). Unintended consequences of performance assessment: Issues of bias and fairness. Educational Measurement: Issues and practice, 14(4), 21-24.
Bond, L., Moss, P., & Carr, P. (1996). Fairness in large-scale performance assessment. In G. W. Phillips. (Ed.), Technical Issues in Large-Scale Performance Assessment. (ERIC Document Reproduction Service No. ED399300).
Burton, N. (1995, April). How have the changes in the SAT affected women s mathematics performance? Paper presented at the Annual Meeting of the American Research Association, San Francisco.
Bell, P. F., & Lentz Jr, F. E. (1992). Effects of curriculum-test overlap on standardized achievement test scores: Identifying systematic confounds in educational decision making. School Psychology Review, 21(4), 644-656.
Bode, R. K. (2004). Partial credit model and pivot anchoring. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications(pp. 279-295). Maple Grove: Minnesota.
Brookhart, S. M. (1997). Effects of the classroom assessment environment on mathematics and science achievment. Journal of Educational Research , 90(6), 323-330.
Bond, T., & Fox, C. (2001) Applying the Rasch model: Fundamental measurement in the human sciences, Mahwah NJ: Lawrence Erlbaum Associates.
Bagley, T., & Gallenberger, C. (1992). Assessing students’ dispositions: Using journals to improve students’ performance. Mathematics Teacher, 85(8), 660-663.
Clauser, B. E. (2000). Recurrent issues and recent advances in scoring performance assessments. Applied Psychological Measurement, 24(4), 310-324
Camilli, G. .& Shepard, L. A., (1994). Identifying biased test items. Methods for identifying biased test items, 1, 2-21. Thousand Oaks: SAGE.
Cai, J., & Jakabcsin, M., & Lane, S. (1996). Assessing students’ mathematical communication. School Science & Mathematics, 96(5), 238-246.
Chipman, S. F, (1992). Content effects on word problem performance: A possible source of test bias? American Educational Research Journal, 28 (4), 897-915.
Cheong, Y. F. (2006). Analysis of school context effects on differential item functioning using hierarchical generalized linear models. International Journal of Testing, 6(1), 57-79.
Doolittle, A. E. (1984). Interpretation of Differential Item Performance Accompanied by Gender Differences in Academic Background. Paper presented at the Annual Meeting of the American Educational Research Association. (68th, New Orleans, LA, April 23-27, 1984).
Doolittle, A. E. (1985). Understanding differential item performance as a consequence of gender differences in academic background. Paper presented at the Annual Meeting of the American Educational Research Association . Chicago: IL.
Doolittle, A. E., & Cleary, T. A. (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24, 157-166.
Delors, J. et al.(1996). Learning: The Treasure Within. Retrieved March 28, 2008, from http://www.unesco.org/delors/treasure.htm
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and Standardization. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Erlbaum.
Du, Yi., Wright, B. D., & Brown W. L.(1996). Differential facet functioning detection in direct writing assessment. Paper presented at the Annual Meeting of the American Educational Research Association. (New York, NY, April 8-12,1996).
Embretson, S. E., Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum.
Fredua-Kwarteng, E. (2005). A perspective on gender disparity in mathematics education. (ERIC Document Reproduction Service No. ED 493750).
Fisher, Jr., W. P. (1992). Reliability statistics. Rasch measurement transactions, 6, 238.
Gallagher, A. M., & Lisi, R. (1994). Gender differences in Scholastic Aptitude Test: Mathematics problem solving among high-ability students. Journal of Educational Psychology, 86(2), 204-211.
Green, J. (2002). Replacing lectures by text-based flexible learning students'' performance and perceptions. Journal of Biological Education, 36(4) ,176-182.
Gender, M., & George, E. Jr. (1999). Gender difference in performance on multiple-choice and constructed response mathematics items. Applied Measurement in Education, 12(1), 29-51.
Gierl, M. J., Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognition skills that produce gender differences in mathematics: A demonstration of the DIF analysis paradigm. Journal of Educational Measurement, 40, 281-306.
Gordon, C. J., & Macinnis, D. (1993). Using journal as a window on students’ thinking in mathematics. Language Arts, 70(1), 37-43.
Gao, X. (1996). Sampling variability and generalizability of work keys listening and writing scores (ACT research report series 96-1). Iowa City, Iowa: ACT.
Gao, X., Shavelson, R. J., & Baxter, G. P. (1994). Generalizability of large-scale performance assessments in science: Promises and problems. Applied Measurement in Education, 7(4) 323-342.
Harris, A. M., & Carlton, S. T. (1992). Characteristics associated with differential item functioning on the Scholastic Aptitude Test: Gender and majority/minority group comparisons. Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED 385574)
Harris, A. M., & Carlton, S. T. (1993). Patterns of gender differences on mathematics items on the Scholastic Aptitude Test. Applied Measurement in Education, 6, 137-152.
Hagquist, C., & Andrich, D. (2003). Is the sense of coherence-instrument applicable on adolescents? A latent trait analysis using Rasch-modelling. Personality and Individual Differences, 36(2004), 955-968.
Huggins, E. M.; & Celio, M. B. (2002). Closing the achievement gap in Washington state: Holding schools accountable for equity. (ERIC Document Reproduction Service No.ED 474 880)
Hunter, J. E. (1975, December). A critical analysis of the use of item means and item-test correlations to determine the presence or absence of content bias in achievement test items. Paper presented at the National Institute of Educational Conference on Test Bias, Annapolis, MD.
Herman, L. J., Aschbacher, R. P., & Winters, L. (1992). Linking assessment and instruction. A Practical Guide to Alternative Assessment, 2, 12-22. ASCD: Virginia.
Hamilton, L. S. (2000). Detecting gender-based differential item functioning on a constructed- response science test. Measurement in Education. 12(3), 211-235.
Holweger, N., & Tayler, G. (1998). Differential item functioning by gender on a large-scale science performance assessment: A comparison across grade levels. (ERIC Document Reproduction Service No. ED 423 282)
Jackson, C., & Braswell, J. (1992, April). An analysis of factors causing differential item functioning on SAT-Mathematics items. Paper presented at the Annual Meeting of the American Research Association, San Francisco.
Jovanovic, J., Solano-Flores, G., & Shavelson, R. J. (1994). Performance-based assessment will gender difference science achievement be eliminated? Education and Urban Society, 26(4), 352-366.

Jurdak, M., & Zein, R. A. (1998). The effect of journal writing on achievement in and attitudes toward mathematics. School Science and Mathematics, 98(8), 412-419.
Kim, Dong-il. (1995). Application of confirmatory factor analysis to the validity study of a performance assessment: A multitrait-multimethod structure and its invariance across gender and grade. draft. Paper presented at the Annual Meeting of the American Educational Research Association. San Francisco, CA, April 18-22, 1995. (ERIC Document Reproduction Service No. ED 388674)
Khattri, N.; Reeve, A. L.; & Kane, M. B. (1998). Characteristics of performance assessment. Principles and practices of performance assessment, 2, 15-61.Mahwah, NJ: Lawrence Erlbaum Associate, Publishers.
Lord, F. M. (1977). A study of item bias using item characteristic curve theory. In Y. H. poortinga (Ed.), Basic problems in cross-cultural psychology (pp. 19-29). Amsterdam: Swets & Zeitlinger.
Leinhardt, G., & Seewald, A. W. (1981). Overlap: What’s tested, what’s taught? Journal of Eduactional Measurement, 18(2), 85-94.
Linacre, J. M. (2004). Optimizing rating scale category effectiveness. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications (pp. 258-278). Maple Grove: Minnesota.
Lehman, J. D. (1986). Opportunity to learn and differential item functioning. Unpublished doctoral dissertation, University of California, Los Angeles.
Linn, R. L., Banker E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectation and validity criteria. Educational Researcher, 20(8), 15-21.
Linn, R. L, Burton, E., Destefano, L. & Hanson, M., (1996). Generalizability of new standards project 1993 pilot study tasks in mathematics. Applied Measurement in Education, 9(3), 201-214.
Lane, S. (1994). Detecting of gender-related differential item functioning in a mathematics performance assessment. Applied Measurement in Education, 9(2), 175-199.
Lane, S. (1995). Gender-related differential item functioning on a middle-school mathematics performance assessment. New York, N. Y; Ford Foundation. (ERIC Document Reproduction Service No. ED 392 821)
Lane, S., Liu, M., Ankenmann, R. D., & Stone, C. A. (1995). Examination of the assumptions and properties of the graded item response model: An example using a mathematics performance assessment. Applied Measurement in Education, 8(4), 313-40. (ERIC Document Reproduction Service No. EJ517182)
Lane, S., Liu, M., Ankenmann, R. D., & Stone, C. A. (1996). Generalizability and validity of a mathematics performance assessment. Journal of Educational Measurement, 33(1), 71-92.
Lane, S., Wang, N., & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational measurement:issues and practice, 15(4), 21-27.

Linn, R. L., & Gronlund, N. E. (2000). Measurement and Assessment in teaching( ed.). New Jersey: Prentice-Hall.
Langenfeld, T. E., (1997). Test fairness: Internal and external investigations of gender bias in mathematics testing. Educational measurement: Issues and practice, 16(1), 20-25.
Methen, B. (1988). Instructional sensitivity in mathematics achievement test items: application of a new IRT-based detection technique. Paper presented at the Annual Meeting of the American Research Association, (New Orleans, LA. April 5-9, 1988.)
Mcbee, M. M. & Barnes, L. B. (1998). The generalizability of a performance assessmemt measuring achievement in eighth-grade mathematics. Applied Measurement in Education, 11(2), 179-194.
Miao, C. & Kramer, G. A.(1992). Detecting differential item functioning using the Rasch model with equivalent-group cross-validation. Paper presented at the Annual Meeting of the American Research Association, San Francisco. (ERIC Document Reproduction Service No. ED 350 318)
Maccoby, E. & Jacklin, C. (1974). Psychology of sex differences. Palo Alto, CA: Stanford University Press.
Mulligan, J., Looveer, J.,& Busatto, S.(2006). The development of a Numeracy Achievement Scale to assess prigress from kindergarten through year 6. Paper presented at the ACSPRI Science Methodology Conference, Sydney.
Mok, M. M C. (2004). Validation of scores from self-learning scales for primary students using true-score and Rasch measurement methods. Journal of Applied Measurement, 5(3), 258-286.
Miller, M. D., & Linn, R. (1989). Invariance of item characteristic functions with variations in instructional coverage. Journal of Educational Measurement, 25(3), 205-219.
Miller, M. D., & Linn, R. (2000) Validation of performance-based assessment. Applied Psychological Measurement. 24(4), 367-378. .
Mangione, M. (1995). Understanding the critics of the educational technology: Gender inequities and computers 1983-1993. (ERIC Document Reproduction Service No. ED 383311)
Metcalf, L. A. (2002). Curriculum-sensitive assessment: A psychometric study of tracking as a distributor of opportunity to learn high school mathematics. Unpublished Doctoral dissertation, University of Illinois: Urbana-Champaign.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement, 13-104. New York: Macmillan.
Messick, S. (1994a). Validity of psychological assessment: Validation of inferences from persons’ responses and performance as scientific inquiry into score meaning. (ETS RR-94-95). Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED 380 504)
Messick, S. (1994b). Alternative modes of assessment, uniform standard of validity. (ETS RR-94-60). Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED 380 496)

Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational measurement: Issues and Practice, 14(4), 5-8.
Messick, S. (1996). Validity of performance-based assessment. In G. W. Phillips(ED.), Technical issues in large-scale performance assessment(pp. 1-18). (ERIC Document Reproduction Service No. ED 399 300)
National Council of Teachers of Mathematics (NCTM) (2000). Principles and standards for school mathematics. Available online: http://standard.nctm.org.
Ornstein, A. C. (1983). How good are teachers in effecting student outcomes?NASSP Bulletin, 67(459), 73-80.
O’Neil, A. C., & Mcpeek, W.M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer(Eds.), Differential item functioning Hilldale, NJ: Lawrence Erlbaum Associates.
Prowker, A., & Camilli, G. (2007). Looking beyond the overall scores of NAEP assessments: Applications of generalized linear mixed modeling for exploring value-added item difficulty effects. Journal of Educational Measurement, 44(1), 69-87.
Polya, G. (1945). How to solve it. Princeton, NJ: Princeton University Press.
Pomplum, M. & Capps, L.(1999). Gender differences for constructed-response mathematics items. Educational and Psychological Measurement, 59(4), 597-614.
Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational measurement: Issues and Practice, 19(3), 5-15.
Phillips, S. E., & Mehrens, W. A. (1988). Effects of curricular differences on achievement test data at item and objective levels. Applied Measurement in Education, 1(1), 33-51.
Ryan, K. E., & Fan, M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational measurement: Issues and practice, 15(4), 21-27.
Roussos, L. & Stout, W. (1996a). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenzel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
Roussos, L. & Stout, W. (1996b). A multidimensinality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355-371.
Shepard, L. A. (1990). Inflated test score gains: Is the problem old norms or teaching the test? Educational Measurement: Issues and Practice, 9(3), 15-22.
Shepard, L. A.; Flexer, E. H., Hiebert, E. H., Marion, S. F., Mayfield, V., & Weston, T. J. (1996). Effect of introducing classroom performance assessment on student learning. Educational measurement: Issues and Practice, 15(3), 7-18.
Switzer, D. M. (1993). Differential item functioning and opportunity to learn: Adjusting the Mantel-Haenszel chi-square procedure. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.


Smith, Everett V. Jr. (2004). Evidence for the reliability of measures and validity of measure interpretation: A rasch measurement perspective. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications(pp. 93-122). Maple Grove: Minnesota.
Smith, Everett V. Jr. (2004). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residual. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications(pp. 575-600). Maple Grove: Minnesota.
Schumacker, R. E. (2004). Rasch measurement: The dichotomous model. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications (pp. 226-257). Maple Grove: Minnesota.
Stiggins, R. J. (1995). Sound performance assessment in the guidance context. Eric Digest. (Eric Document Reproduction Service No. ED 388 889)
Shavelson,R. J., Baxter,G P., & Gao, X. (1993). Sampling variability of performance assessment. Journal of Educational Measurement, 30(3), 215-232.
Shavelson,R. J., Baxter,G P., & Pine, J. (1991). Performance assessment: Political rhetoric and measurement reality. Educational Research, 21(4), 22-27.
Smith, R. M. (1992). Assessing unidimensionality for the Rasch rating scale model. Paper presented at the annual meeting of the American Educational Research Association, San Francisco,CA.
Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3 , 25-40.
Smith, R. M. (2004). Fit analysis in latent trait measurement models. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications(pp. 73-92). Maple Grove: Minnesota.
Smith, R. M. (2004). Detecting item bias with the Rasch model. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications(pp. 391-418). Maple Grove: Minnesota.
Schwars, R., Rich, C., Arenson, E., Podrabsky, T., & Cook, G. (2002). An analysis of item differential functioning based on calculator type. Paper presented at the Annual Meeting of the American Educational Research Association (68th, New Orleans, LA, April 23-27, 1984). (ERIC Document Reproduction Service No. ED 463326)
Sheally, R. & Stout, W. (1993). A model-based standardization approach that separates true Bias/DIF from group ability differences and detects test Bias/DIF as well as item Bias/DIF. Psychometrika, 58(2), 159-194.
Scottish Qualifications authority, (2003). Key competencies : Some international comparisons. (2003). Retrieved March, 28, 2008, from www.sqa.org.uk/files_ccc/Key_Competencies.pdf
Wright, B. D., & Master, G. N. (1982). Rating scale analysis. Chicago: MESA Press.
Wright B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.
Wright, B. D., (1996). Reliability and separation. Rasch measurement transactions, 9, 472.

Wright, B. D., & Mok, M. C. (2004). An overview of the family of Rasch measurement models. In E. V. Smith Jr., & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, model and applications(pp. 1-24). Maple Grove: Minnesota.
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75, 200-214.
Walsh, M., Hickey, C., & Duffy, J. (1999). Influence of Item content and stereotype situation on gender differences in a mathematical problem solving. Sex Roles: A Journal of Research, 41 (3/4)219-240.
Williams, J. E. (1994). . Anxiety measurement: Construct validity and test performance. Measurement & Evaluation in Counseling & Development, 27(1), 302-308.
Wang, J., & Goldschmdt, P. (1999). Opportunity to learn, language proficiency, and immigrant status: Determinants of middle school students’ mathematics achievement and growth. The Journal of Educational Research, 93, 101-111.
Wang, J., & Goldschmdt, P. (2003) Importance of middle school mathematics on high school students'' mathematics achievement. The Journal of Educational Research, 97(1), 1-18.
Wang, N., & Lane, S., (1994). Detection of gender-based differential item functioning in a mathematics performance assessment. New York, N. Y; Ford Foundation. (ERIC Document Reproduction Service No. ED 377 247)
Watson, J. & Kelly, B. (2005). Cognition and instuuction: Reasoning about bias in sampling. Mathematics. Education Research Journal, 17(1), 24-57.
Wiersma, W. (1983). Assessment of teacher performance: Constructs of teacher competencies based on factor analysis of observation data. Paper presented at the Annual Meeting of the American Educational Research Association, 67th, Montreal, Quebec, April 11-15, 1983. (ERIC Document Reproduction Service No. ED230586)
Welch, W. W., & Anderson, R. E. (1994). The performance of performance assessment in a large-scale study of computer education. Journal of Educational Computing Research, 11(2), 107-119. (ERIC Document Reproduction Service No. EJ495232)
Zwick, R. (1993). Assessing differential item functioning in performance tests. (ETS RR-93-14). Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED 386 493)
二、中文部分
王秀琲,(民92)。實作評量在國小數學科之應用-以五年級學童分數為例。台中教育大學研究所教育測驗統計研究所碩士論文,未出版,台中市。
王文中(民85)。幾個有關Rasch測量模式的爭議。教育與心理研究,19,1-26。
王文中(民93)。Rasch測量理論與在教育和心理之應用。2004年教育與心理測驗學術研討會。1-47。
吳毓瑩(民85)。評量的蛻變與突破─從哲學思潮與效度理論談起。教育資料與研究,13,2-15。
吳欣黛(民87)。實作評量在效度上的真實性與直接性。國立台北師範學院國民教育研究所碩士論文,未出版,台北市。
林碧珍,(民91)。協助教師撰寫數學日誌以促進反思能力之協同行動研究,新竹師院學報,15期,149-180。
林敬修,(民92)。影響國小數學科實作評量信度相關因素之類推性理論分析。屏東師範學院教育心理與輔導研究所碩士論文,未出版,屏東市。
林奕宏、林世華,(民93)。國小高年級數學科成就測驗中與性別有關的DIF現象。臺東大學教育學報。15:1,67-95。
桂怡芬(民85)。自然科實作評量的效度探討。國立台北師範學院國民教育研究所碩士論文,未出版,台北市。
許芳菊(2006)。全球化下的關鍵能力。天下雜誌2006年教育專刊,22-27。
張麗麗(民86)。城鄉、性別與教學差異對試題偏差指標的影響與四種試題偏差探查方法的比較。行政院國家科學委員會專題研究計畫成果報告。
張麗麗、陳庸(民88)。MH分析「第一語言」(L1)及性別對多元計分語文試題「差異試題功能」(DIF)探查上之影響。行政院國家科學委員會專題研究計畫成果報告。
張麗麗(民91)。從分數的意義談實作評量效度的建立。教育研究月刊,98,37-51。
張麗麗(民93)。評量改革的應許之地,虛幻或真實?─談實作評量之作業與表現規準。教育研究月刊,93,76-86。
張麗麗(2003)。作業取樣對數學實作評量分數類推之影響。教育研究資訊,2003, 11(6), 65-100.
張銘秋,(民95)。實作評量無關構念變異之探討─以國小數學科為例。臺南大學測驗統計研究所碩士論文,未出版,臺南市。
張鈿富、吳京玲、陳清溪與羅婉綺(民96)。歐盟教育政策的趨勢與啟示【電子版】。教育研究與發展期刊,3(3),99-126。2008年3月28日,取自http://tw.wrs.yahoo.com/_ylt=A8tUxzPYDe1HZ3IA6yFr1gt.;_ylu=X3oDMTFiaTY0bjY3BHNlYwNzcgRwb3MDMQRjb2xvA3RwZQR2dGlkA1RXQzAzNV8xMzYEbANXUzE-/SIG=132mia762/EXP=1206804312/**http://journal .naer.edu.tw/UploadFilePath/dissertation/0vol010_05vol010_05.pdf
陳濱興,(民90)。國小數學解題實作評量與後設認知之相關研究。臺中教育大學教育測驗統計研究所碩士論文,未出版,台中市。
黃財尉、李信宏,(民88)。國中數學成就測驗性別DIF之探討:Poly-sibtest 的應用與分析。中國測驗學會測驗年刊,88,p45-60。
詹元智,(民91)。國小數學科實作評量之效度探討。屏東教育大學教育心理與輔導研究所碩士論文,未出版,屏東市。
蔡正濱(民95)。國小數學科實作評量評分者一致性相關因素探討。國立屏東教育大學教育與心理輔導研究所碩士論文(未出版)。
蔡嘉宮(民95)。TIMSS數學實作評量試題在台灣試行施測結果之分析比較。臺中教育大學教育測驗統計研究所碩士論文,未出版,臺中市。
盧雪梅,(民88)。差別試題功能(DIF)的檢定方法。台北市立師範學院學報,30期,149-166。
薛麗卿(民87)。數學寫作活動對國小學生解題能力及數學態度之影響。台灣師範大學教育心理與輔導研究所碩士論文,未出版,台北市。
魏宗明(民86)。國小實施數學寫作活動之研究。嘉義師範學院國民教育研究所碩士論文,未出版,嘉義市。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 王文中(民85)。幾個有關Rasch測量模式的爭議。教育與心理研究,19,1-26。
2. 王文中(民85)。幾個有關Rasch測量模式的爭議。教育與心理研究,19,1-26。
3. 吳毓瑩(民85)。評量的蛻變與突破─從哲學思潮與效度理論談起。教育資料與研究,13,2-15。
4. 吳毓瑩(民85)。評量的蛻變與突破─從哲學思潮與效度理論談起。教育資料與研究,13,2-15。
5. 林碧珍,(民91)。協助教師撰寫數學日誌以促進反思能力之協同行動研究,新竹師院學報,15期,149-180。
6. 林碧珍,(民91)。協助教師撰寫數學日誌以促進反思能力之協同行動研究,新竹師院學報,15期,149-180。
7. 林奕宏、林世華,(民93)。國小高年級數學科成就測驗中與性別有關的DIF現象。臺東大學教育學報。15:1,67-95。
8. 林奕宏、林世華,(民93)。國小高年級數學科成就測驗中與性別有關的DIF現象。臺東大學教育學報。15:1,67-95。
9. 張麗麗(民91)。從分數的意義談實作評量效度的建立。教育研究月刊,98,37-51。
10. 張麗麗(民91)。從分數的意義談實作評量效度的建立。教育研究月刊,98,37-51。
11. 張麗麗(民93)。評量改革的應許之地,虛幻或真實?─談實作評量之作業與表現規準。教育研究月刊,93,76-86。
12. 張麗麗(民93)。評量改革的應許之地,虛幻或真實?─談實作評量之作業與表現規準。教育研究月刊,93,76-86。
13. 張麗麗(2003)。作業取樣對數學實作評量分數類推之影響。教育研究資訊,2003, 11(6), 65-100.
14. 張麗麗(2003)。作業取樣對數學實作評量分數類推之影響。教育研究資訊,2003, 11(6), 65-100.
15. 張鈿富、吳京玲、陳清溪與羅婉綺(民96)。歐盟教育政策的趨勢與啟示【電子版】。教育研究與發展期刊,3(3),99-126。2008年3月28日,取自http://tw.wrs.yahoo.com/_ylt=A8tUxzPYDe1HZ3IA6yFr1gt.;_ylu=X3oDMTFiaTY0bjY3BHNlYwNzcgRwb3MDMQRjb2xvA3RwZQR2dGlkA1RXQzAzNV8xMzYEbANXUzE-/SIG=132mia762/EXP=1206804312/**http://journal