(3.220.231.235) 您好!臺灣時間:2021/03/09 07:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃瓅瑩
研究生(外文):Li-ying Huang
論文名稱:HGLM分析DIF之比較與應用
論文名稱(外文):The comparison Study of Differential Item Functioning Analysis Using Hierarchical Generalized Linear Model
指導教授:鄒慧英鄒慧英引用關係
學位類別:碩士
校院名稱:國立臺南大學
系所名稱:測驗統計研究所
學門:教育學門
學類:教育測驗評量學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:81
中文關鍵詞:SIBTESTDIFHGLMMHLR
外文關鍵詞:SIBTESTLRHGLMMHDIF
相關次數:
  • 被引用被引用:5
  • 點閱點閱:473
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:69
  • 收藏至我的研究室書目清單書目收藏:0
本研究旨在比較HGLM、MH、LR、SIBTEST在DIF分析的差異,同時藉由實徵資料與模擬資料瞭解四種方法的一致性,並藉由模擬資料計算不同方法的型一誤差率、型二誤差率及整體精確性。以HGLM分析DIF的最大特色在於可加入不同變項解釋偵測到的DIF試題。
實徵資料以HGLM、LR、MH三種方法進行DIF分析,並針對偵測到的DIF試題,在HGLM的二階與三階模式中加入個人變項:個人社經地位、試題位置,及組織變項:學校、縣市、地理區域,以瞭解各個變項對DIF試題的解釋情形。此外,並分析男女生在不同數學內容領域的DIF差異。其次,利用根據實徵資料所產生的模擬資料,比較在不同測驗長度、樣本數、DIF比率、試題參數下,HGLM、MH、LR、SIBTEST四種方法分析之精確性及對DIF分析的影響。
主要發現如下:
一、實徵資料結果顯示HGLM、MH、LR檢定結果及DIF效果量分類皆呈現良好一致性。
二、HGLM分析中加入的解釋變項皆能降低DIF比率,且在高中部分最為顯著。在組織階層的三個變項中,發現學校所能解釋DIF情形最為顯著。
三、在數學內容領域中,並未發現某一領域對男生或女生特別有利,但年級間DIF比率最高的內容領域並不相同。
四、模擬資料中結果顯示,HGLM與LR精確度最高。樣本大小、DIF強度對偵測率有顯著的影響,而在試題參數方面,a參數對偵測DIF試題的影響大於b參數。
The purposes of this study are to compare the differences of four DIF methods: HGLM, MH, LR, and SIBTEST. Empirical and simulated data were employed to understand the consistency of four methods. First, HGLM, LR, MH three DIF methods were used to analyze the empirical data from a large-scale mathematics assessment to compare the consistency. Several variables, like SES, location of item block, school, county-city, and region were added into the HGLM to see whether they can provide interpretation for DIF items. Secondly, given the item parameters calibrated from the empirical data, a simulated data was generated under different test length, sample size, DIF ratio, item parameters, and DIF strength. The accuracy rate, type I error rate, and type II error rate were calculated for the four methods to see the effects of detecting DIF items as well as the influence of different conditions on DIF analysis.
The main findings are the following:
1.The results from empirical data showed that HGLM, MH, and LR methods in detecting DIF items and DIF items classifications had very high consistency.
2.All variables adopted in this study, SES, location of item block, school, county-city, and region could reduce the DIF items effectively, especially for senior high students. Among the three organizational variables, the interpretation of school was most powerful.
3.The results didn’t show any gender difference in five math sub-contents, but the sub-contents with having most DIF items for each grade were different.
4.The simulation study showed that HGLM and LR had the highest accuracy. Sample size and DIF strength could have some impact on the detection of DIF. It also showed that a parameter had more impact than b parameter on the detection of DIF.
中 文 摘 要 ………………………………………………………… i
英 文 摘 要 ………………………………………………………… ii
誌 謝 ………………………………………………………… iii
目 次 ………………………………………………………… iv
表 次 ………………………………………………………… vi

圖 次 ………………………………………………………… vii


第壹章 緒論…………………………………………………… 1
第一節 研究動機……………………………………………… 1
第二節 研究目的與研究問題………………………………… 5
第三節 名詞釋義……………………………………………… 6

第貳章 文獻探討……………………………………………… 8
第一節 DIF研究的起源與統計方法之發展…………………… 8
第二節 傳統檢驗DIF之方法與比較…………………………… 12
第三節 HGLM分析DIF之發展與相關研究…………………… 24
第四節 DIF分析法之精確性…………………………………… 27
第五節 數學科性別間DIF相關研究…………………………… 34

第參章 研究設計與方法…………………………………………38
第一節 研究架構…………………………………………………38
第二節 研究對象…………………………………………………39
第三節 研究工具…………………………………………………40
第四節 研究程序…………………………………………………45
第五節 資料分析…………………………………………………45

第肆章 結果與討論………………………………………………47
第一節 實徵資料分析結果………………………………………47
第二節 模擬資料分析結果………………………………………56

第伍章 結論與建議………………………………………………68
第一節 研究結論………………………………………………… 68
第二節 研究限制…………………………………………………70
第三節 研究建議…………………………………………………71
參考文獻 ……………………………………………………………73
王文中、張智宏(1998)。Rasch模式概率比法的差異試題功能分析。中華心理學刊,40(1),15-32。
吳裕益(2006)。階層線性模式。未出版。
林坤昌(1999)。DIF檢定方法之探討與比較。國立台中師範學院國民教育研究所碩士論文。
林奕宏、林世華(2004)。國小高年級數學科成就測驗中與性別有關的DIF現象。台東大學教育學報,15,67-96。
洪碧霞、吳裕益、左太政、鄒慧英、林娟如、林素微(2006a)。臺灣學生學習成就評量資料庫之建置-2006年國小六年級學生數學成就之現況調查研究期末報告。未出版。
洪碧霞、吳裕益、左太政、鄒慧英、林娟如、林素微(2006b)。臺灣學生學習成就評量資料庫之建置-2006年國中二年級學生數學成就之現況調查研究期末報告。未出版。
洪碧霞、吳裕益、左太政、鄒慧英、林娟如、林素微(2006c)。臺灣學生學習成就評量資料庫之建置-2006年高中二年級學生數學成就之現況調查研究期末報告。未出版。
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity form a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Becker, B. J. (1990). Item characteristics and gender differences on the STA-M for mathematically able youths. American Educational Research Journal, 27, 65-87.
Ben-Shakhar, G., & Sinai, Y. (1991). Gender differences in multiple-choice test: The role of differential guessing tendencies. Journal of Educational Measurement, 28, 23-35.
Berberoglu, G. (1995). Differential item functioning (DIF) analysis of computation, word problem and geometry questions across gender and SES group. Studied in Educational Evaluation, 21, 439-356.
Birenbaum, M., & Feldman, R. A. (1998). Rationships between learning patterns and attitudes towards two assessment formats. Educational Research, 40(1), 90-98.
Bolt, D. M. (2000). A SIBTEST approach to testing DIF hypotheses using experimentally designed test items. Journal of Educational Measurement, 37(4), 307-327.
Bryk, A. S., & Raudenbush, S. W., (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedure obscure test fairness issues? In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 397-413). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Chu, K., & Kamata, A. (2003). Test equating with the presence of DIF. Paper presented at the annual meeting of American Education Research Association, Chicago.
Cole, N. S. (1993). History and development of DIF. In P. W. Holland & H. Wainer (Eds.), Differential item functioning(pp. 25-29). Hillsdale, HJ: Lawrence Erlbaum Associates, Inc.
Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L. Linn (Ed.), Educational measurement(pp. 201–219). Washington, DC: American Council on Education.
Cole, N. S., & Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369-382.
Doolittle, A. E. (1989). Gender difference in performance on Mathematics Achievement Items. Applied Measurement in Education, 2(2), 161-177.
Doolittle, A. E., & Cleary, T, A, (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24(2), 157-166.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning(pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23, 355-368.
Engelhard, G., Hansche, L., & Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification test. Applied Measurement in Education, 3, 347-360.
Ethington, A. (1990). Gender differences in mathematics: An international perspective. Journal for Research in Mathematics Education, 21(1), 74-80.
Feingold, A. (1995). The additive effects of differences in central tendency and variability are important in comparisons between groups. American Psychologist, 50 5-13.
Fidalgo, A. M., Ferreres, D., & Muñiz, J. (2004). Liberal and conservative differential item functioning detection using Mantel-Haenszel and SIBTEST: Implications for Type I and Type II Error Rates. The Journal of Experimental Education, 73(1), 23-39.
Fidalgo, A. M., Mellenbergh, G. J., & Muñiz, J. (2000). Effect of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel. Methods of Psychological Research Online, 5(3), 43-53.
Finch, H. (2004). The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295.
Friedman, L. (1989). Mathematics and the gender gap: A meta-analysis of recent studies on sex differences in mathematical tasks. Review of Educational Research, 59, 185-213.
Goldstain, H. (1987). Multilevel models in educational and social research. Londons: Griffin.
Hanna, G. (1989). Mathematics achievement of girls and boys in grade eight: Result from twenty countries. Educational Studied in Mathemetics, 20, 225-232.
Harris, A., & Carlton, S. (1993). Patterns of gender differences on mathematics items on the scholastic aptitude test. Applied Measurement in Education, 6, 137-151.
Hedges, L. V., & Friedman, L. (1993). Gender differences in variability in intellectual abilities: A reanalysis of Feingold''s results. Review of Educational Research, 63, 94-105.
Hidalgo, M. D., & López-Piza, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64, 903-915.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun(Eds.), Test validity (pp.129-145). Hillsdale, NJ: Lawrence Erlbaum.
Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: a meta-analysis. Psychological Bulletin, 107(2), 139-155.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349.
Kamata, A. (2001). Item analysis by hierarchical generalized linear model. Journal of Educational Measurement, 38(1). 79-93.
Kamata, A. (2002). Procedure to perform item response analysis by Hierarchical generalized linear model. Paper presented at the annual meeting of American Education Research Association, New Orleans.
Kamata, A., Genc, E., & Bilir, K. (2005). Random-effect differential item functioning across group unites by the hierarchical generalized linear model. Paper presented at the annual meeting of American Education Research Association, Montreal, Canada.
Kamata, A., & Vaughn, B. K. (2004). An introduction to differential item functioning analysis. Learning Disabilities: A Contemporary Journal, 2, 49-69.
Kim, W. (2003). Development of a differential item functioning (DIF) procedure using the hierarchical generalized linear model: A comparison study with logistic regression procedure. Unpublished doctoral dissertation, University of Pennsylvania.
Lane, S., Wang, N., & Magone, M. (1996). Gender-ralated differential item functioning on a middle-school mathematics performance assessment. Educational Measurement: Issues and Practice, 15(4), 21-27.
Langenfeld, T. E. (1997). Test fairness: Internal and external investigations of gender bias in mathematics testing. Educational Measurement: Issues and Practice, 16(1), 20-26.
Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677.
Luppescu, S. (2002, Aprial). DIF detection in HLM. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. ED 479334.
McAllister, P. H. (1993). Testing, DIF, and public policy. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 389–396). Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118.
Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19, 289-304.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement. Washington, DC: American Council on Education.
Miller, D. M., & Oshima, T. C. (1992). Effect of sample size, number of biased items, and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement, 16, 381-388.
Monahan, P. O., & Ankenmann, R. D. (2005). Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel Chi-square test for differential item functioning. Journal of Educational Measurement, 42(2), 101-131.
Monahan, P. O., McHorney, C. A., Stump, T. E., & Perkins, A. J. (2007). Odds ratio. Delta, ETS classification, and standardization measure of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32(1), 92-109.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measutement, 18, 315-328.
Narayanan, P., & Swaminathan, H. (1996). Indentification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274.
O''Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning(pp. 255-276). Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Pommerich, M., Spray, J. A., & Parshall, C. G. (1995). An analytical evaluation of two common-odds ratio as population indicators of DIF. ACT Research Report Series 95-1. Iowa City, IA: ACT.
Potenza, M. T., & Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrica, 53, 495-502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned area between two item response functions. Applied Psychological Measurement, 14, 197-207.
Raudenbush, S. W., & Bryk, A. S., (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed). Newbury Park, CA: Sage.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studies item parameters on SIBTEST and Mantel-Haenszel Type I error perfermence. Journal of Educational Measurement, 33, 215-230.
Ryan, K. E., & Chiu, S. (2001). An examination of item context effect, DIF, and gender DIF. Applied Measurement in Education, 14, 73-90.
Ryan, K. E., & Fan, M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational Measurement : Issues and Practice, 15(4), 15-20.
Scheunemann, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16, 143-152.
Schrader, S. V., & Asley, T. (2006). Sex differences in the Tendency to omit items on multiple-choice test: 1980-2000. Applied Measurement in Education, 19(1), 41-65.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects that bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
Shen, L. (1999). A multilevel assessment of differential item functioning. Paper present at the annual meeting of American Educational Research Association, Montreal, Canada.
Shepard, L. A., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Satatistics, 6, 317-375.
Shepard, L. A., Camilli, G., & Williams, D. M. (1984). Accounting for Statistical artifacts in item bias research. Journal of Educational Statistics, 9, 93-128.
Shepard, L. A., Camilli, G., & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educatioanl Measurement, 22, 77-105.
Swafford, J. (1980). Sex differences in firest-year algebra. Journal for Research in Mathematics Education, 11, 335-346.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Swanson. D. B., Clauser, B. E., Case, S. M., Nungester, R. J., & Featherman, C. (2002). Analysis of differential item functioning using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 1, 53-75.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp.147-169). Hillsdale, NJ: Lawrence Erlbaum.
Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18, 15-25.
Wang, W. C., & Su, Y. H. (2004). Effect of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2). 113-144.
Whitmore, M. L., & Schumacker, R. E. (1999). A comparison of logistic regression and analysis of variance differential item functioning detection method. Educational and Psychological Measurement, 59, 910-927.
Zhang, Y., & Zhang, L. (2002). Modeling school and district effect in math achievement of Delaware students measured by DSTP: A preliminary application of hierarchical linear modeling in accountability study. ED 468692.
Zieky , M. ( 1993 ) . Pratical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Zieky, M. (2003). DIF primer. Paper presented at the Educational testing Servise.
Zieky, M. (2006). Fairness review in assessment. In S. M. Downing & T.M. Haladyna (Eds.), Handbook of test development (pp. 359-376). Mahwah, NJ: Lawrence Erlbaum Associate.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential items functioning(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type(ordinal) item scores. Ottawa, Canada: Directorate of human resources research and evaluation, Department of National Defense.
Zumbo, B. D. (2001). Investigating DIF by statistical modeling of the probability of endorsing an item: logistic regression and extensions thereof. Paper presented at the National Council for Measurement in Education meeting.
Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Measurement, 16, 143-152.
Zwick, R. & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26, 55-66.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔