(3.238.174.50) 您好!臺灣時間:2021/04/11 11:34
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:顏雅雲
研究生(外文):Ya-Yun Yen
論文名稱:團體能力分配不同對MIMIC法進行二分題差異試題功能檢驗之影響
論文名稱(外文):The Influence of Impact on the DIF Detection with MIMIC method for Dichotomous items
指導教授:蘇雅蕙蘇雅蕙引用關係
指導教授(外文):Ya-Hui Su
口試委員:陳淑英施慶麟
口試委員(外文):Shu-Ying ChenChing-Lin Shih
口試日期:2013-06-21
學位類別:碩士
校院名稱:國立中正大學
系所名稱:心理學研究所
學門:社會及行為科學學門
學類:心理學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:77
中文關鍵詞:差異試題功能試題反應理論多指標多因子檢驗方法團體能力分配差異MIMIC法
外文關鍵詞:MIMICdifferential item functioningitem response theorymultiple indicators–multiple causesdifferent ability distributions
相關次數:
  • 被引用被引用:0
  • 點閱點閱:256
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:42
  • 收藏至我的研究室書目清單書目收藏:0
差異試題功能 (DIF) 是指來自不同團體但具相同能力的應試者,對於同一道試題的作答表現有所不同。過去DIF研究多數操弄團體之間能力分配平均值不同的情境進行檢驗,卻忽略真實情境團體能力分配變異數亦可能有所不同。因此本研究欲檢視團體間能力分配不同時,以MIMIC法進行DIF檢驗的效果。本研究提出先以迭代的MIMIC (M-IT) 法找出一組定錨題,再逐一檢驗除定錨題以外的其它試題,是為M-STPA法,有別於採同時檢定被檢驗試題的定錨題之MIMIC (M-PA) 法。因為M-STPA法是採逐一檢驗試題DIF,所以預期此法在團體間能力分配有差異時,會有優於M-PA法的表現,同時M-STPA與過去研究使用的標準MIMIC (M-ST) 法、量尺純化的MIMIC (M-SP) 法、M-PA法等進行比較,研究結果發現:(1)當兩團體能力分配變異相同(變異數等於1)但平均不同(平均差異為1)時,測驗中DIF試題含量超過10%時,M-ST法的型一錯誤會產生嚴重的膨脹,又DIF試題含量超過30%時,M-SP法的型一錯誤也會產生嚴重的膨脹,M-PA法和M-STPA法在DIF試題含量高達40%時,仍有控制良好的型一錯誤;(2)當兩團體能力分配平均相同但變異不同時(變異數差異為0.5、-0.5),四種檢驗方式的型一錯誤皆產生嚴重的膨脹;(3)當兩團體能力分配平均和變異皆不同時:(i)兩團體能力分配平均值差異為1(焦點團體能力分配平均值小於參照團體)和變異數差異為0.5(焦點團體能力分配變異數小於參照團體)時,即使測驗中DIF試題含量超過40%,M-STPA法仍穩定控制型一錯誤;(ii)兩團體能力分配平均值差異為1和變異數差異為-0.5時,雖然四種檢驗方式皆產生型一錯誤膨脹,但M-STPA法的檢驗結果明顯受到測驗長度影響。
Differential item functioning (DIF) occurs when subgroups of test takers have equal trait levels but differ in their probabilities of a correct response. Many simulation studies have been done to examine the performance of these methods to flag DIF items. However, among these studies, there is little attention to the effects on DIF detection methods of the difference in ability variance between two groups. Thus, the aim of this study is to examine how the difference combinations of ability variance and mean between reference and focal groups affect four multiple indicators–multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), the MIMIC method with a pure anchor (M-PA), and the standard MIMIC method with a pure anchor (M-STPA). In a series of simulations, it appeared that (1) under mean difference in ability, all four methods yielded a well- controlled Type I error rate when tests did not contain any DIF items. M-ST and M-SP began to yield an inflated Type I error rate and a deflated power when tests contained 20% and 40% DIF items, respectively. M-PA and M-STPA maintained an expected Type I error rate and a high power even when tests contained as many as 40% DIF items; (2) the difference in ability variance inflates the Type I errors for all the DIF detection methods; (3) when both mean difference in ability and difference in ability variance existed: (i) M-STPA maintained an expected Type I error rate when focal groups had smaller ability variance; (ii) all the DIF detection methods yield an inflated Type I error rate when focal groups had bigger ability variance. Test length appeared to have effect in M-STPA.
誌謝辭 i
中文摘要 ii
Abstract iii
目錄 iv
表目錄 v
圖目錄 vii
第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的與問題 3
第二章 文獻探討 5
第一節 差異試題功能 5
第二節 團體間能力分配對DIF檢驗影響 6
第三節 MIMIC在DIF的發展研究與檢驗方法 8
第四節 本研究之研究方向 17
第三章 研究方法 21
第一節 資料產生 21
第二節 研究設計 24
第三節 分析程序 25
第四章 研究結果與討論 27
第一節 團體能力分配平均值相同變異數相同 27
第二節 團體能力分配平均值相同變異數不同 33
第三節 團體能力分配平均值不同變異數相同 41
第四節 團體能力分配平均值不同變異數不同 46
第五節 團體能力分配與MIMIC法 55
第五章 結論與建議 67
第一節 結論 67
第二節 本研究貢獻 69
第三節 未來研究之建議 71
參考文獻 73

ACT. (1992). High school profile report. Iowa City: ACT.
Bielinski, J., & Davison, M. L. (1998). Gender differences by item difficulty interactions in multiple-choice mathematics items. American Educational Research Journal, 35, 455-476.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-424). Reading, MA: Addison-Wesley.
Bradley, J. V. (1978). Roubustness? TheBritish Journal of Mathematical and Statistical Psychology, 31, 144-152.
Clauser, B., Mazor, K., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel&Haenszel procedure. Applied Measurement in Education, 6, 269-279.
Cohen, A. S., Kim, S. H., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15-26.
Feingold, A. (1992). Sex differences in variability in intellectual abilities: A new look at an old controversy. Review of Educational Research, 62, 61-84.
Feldt, L. S., Forsyth, R. A., Ansley, T. N., & Alnot, S. D. (1993). Iowa Tests of Educational Development. Iowa City: The University of Iowa.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences, 57B(5), S275-S283.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.
Glöckner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10, 544-565.
Hedges, L. V., & Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science, 269, 41-45.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.
Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Hoover, H. D., Hieronymus, A. N., Frisbie, D. A., & Dunbar, S. B. (1993). Iowa Tests of Basic Skills. Chicago: The Riverside Publishing Company.
Jöreskog, K.G. & Goldberger, A.S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631-639.
Kim, S.-H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29, 551-566.
Lautenschlager, G. J., Flaherty, V. L., & Park, D. (1994). IRT differential item functioning: An examination of ability scale purifications. Educational and Psychological Measurement, 54, 21-31.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
McDonald, R. P. (1967). Nonlinear factor analysis. Psychometrika Monographs, 15, 1-167.
Monahan, P. O., & Ankenmann, R. D. (2005). Effect of unequal variances in proficiency distributions on Type I error of the Mantel-Haenszel chi-square test for differential item functioning. Journal of Educational Measurement, 42, 101-131.
Muthén, B. O. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer & H. Braun (Eds.), Test validity (pp. 213-238). Hillsdale, NJ: Lawrence Erlbaum.
Muthén, B. O., Kao, C. F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1-22.
Muthén, L. K., & Muthén, B. O. (1998-2007). Mplus user’s guide. 5th ed. Los Angeles, CA: Muthén & Muthén.
Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Journal of Psychological Assessment, 18, 9-15.
Nowell, A.,& Hedges, L.V. (1998). Trends in gender differences in academic achievement from 1960 to 1994: An analysis of differences in mean, variance and extreme scores. Sex Roles, 39, 21–43.
Oort, F. J. (1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling, 5, 107-124.
Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.
Pei, L. K., & Li, J. (2010). Effects of unequal ability variances on the performance of logistic regression, Mantel-Haenszel, SIBTEST IRT, and IRT likelihood ratio for DIF detection. Applied Psychological Measurement, 34, 453–456.
Shealy, R. T., & Stout, W. F. (1993). A model based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159-194.
Shih, C. L., & Wang, W. C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
Strand, S., Deary, I.J., & Smith, P. (2006). Sex differences in cognitive ability test scores: A UK national picture. British Journal of Educational Psychology, 76, 463–480.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdle, NJ: Lawrence Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum.
Wainer, H., Sireci, S., & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197-219.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied Measurement, 9, 387-408.
Wang, W.-C., & Shih, C.-L. (2010). MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items. Applied Psychological Measurement, 34, 166-180.
Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713-731.
Wang, W.-C., & Su, Y.-H. (2004a). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144.
Wang, W.-C., & Su, Y.-H. (2004b). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.
Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔