王文中、張智宏(1998)。Rasch模式概率比法的差異試題功能分析。中華心理學刊,40(1),15-32。吳裕益(2006)。階層線性模式。未出版。
林坤昌(1999)。DIF檢定方法之探討與比較。國立台中師範學院國民教育研究所碩士論文。林奕宏、林世華(2004)。國小高年級數學科成就測驗中與性別有關的DIF現象。台東大學教育學報,15,67-96。
洪碧霞、吳裕益、左太政、鄒慧英、林娟如、林素微(2006a)。臺灣學生學習成就評量資料庫之建置-2006年國小六年級學生數學成就之現況調查研究期末報告。未出版。
洪碧霞、吳裕益、左太政、鄒慧英、林娟如、林素微(2006b)。臺灣學生學習成就評量資料庫之建置-2006年國中二年級學生數學成就之現況調查研究期末報告。未出版。
洪碧霞、吳裕益、左太政、鄒慧英、林娟如、林素微(2006c)。臺灣學生學習成就評量資料庫之建置-2006年高中二年級學生數學成就之現況調查研究期末報告。未出版。
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity form a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Becker, B. J. (1990). Item characteristics and gender differences on the STA-M for mathematically able youths. American Educational Research Journal, 27, 65-87.
Ben-Shakhar, G., & Sinai, Y. (1991). Gender differences in multiple-choice test: The role of differential guessing tendencies. Journal of Educational Measurement, 28, 23-35.
Berberoglu, G. (1995). Differential item functioning (DIF) analysis of computation, word problem and geometry questions across gender and SES group. Studied in Educational Evaluation, 21, 439-356.
Birenbaum, M., & Feldman, R. A. (1998). Rationships between learning patterns and attitudes towards two assessment formats. Educational Research, 40(1), 90-98.
Bolt, D. M. (2000). A SIBTEST approach to testing DIF hypotheses using experimentally designed test items. Journal of Educational Measurement, 37(4), 307-327.
Bryk, A. S., & Raudenbush, S. W., (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedure obscure test fairness issues? In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 397-413). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Chu, K., & Kamata, A. (2003). Test equating with the presence of DIF. Paper presented at the annual meeting of American Education Research Association, Chicago.
Cole, N. S. (1993). History and development of DIF. In P. W. Holland & H. Wainer (Eds.), Differential item functioning(pp. 25-29). Hillsdale, HJ: Lawrence Erlbaum Associates, Inc.
Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L. Linn (Ed.), Educational measurement(pp. 201–219). Washington, DC: American Council on Education.
Cole, N. S., & Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369-382.
Doolittle, A. E. (1989). Gender difference in performance on Mathematics Achievement Items. Applied Measurement in Education, 2(2), 161-177.
Doolittle, A. E., & Cleary, T, A, (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24(2), 157-166.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning(pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23, 355-368.
Engelhard, G., Hansche, L., & Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification test. Applied Measurement in Education, 3, 347-360.
Ethington, A. (1990). Gender differences in mathematics: An international perspective. Journal for Research in Mathematics Education, 21(1), 74-80.
Feingold, A. (1995). The additive effects of differences in central tendency and variability are important in comparisons between groups. American Psychologist, 50 5-13.
Fidalgo, A. M., Ferreres, D., & Muñiz, J. (2004). Liberal and conservative differential item functioning detection using Mantel-Haenszel and SIBTEST: Implications for Type I and Type II Error Rates. The Journal of Experimental Education, 73(1), 23-39.
Fidalgo, A. M., Mellenbergh, G. J., & Muñiz, J. (2000). Effect of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel. Methods of Psychological Research Online, 5(3), 43-53.
Finch, H. (2004). The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295.
Friedman, L. (1989). Mathematics and the gender gap: A meta-analysis of recent studies on sex differences in mathematical tasks. Review of Educational Research, 59, 185-213.
Goldstain, H. (1987). Multilevel models in educational and social research. Londons: Griffin.
Hanna, G. (1989). Mathematics achievement of girls and boys in grade eight: Result from twenty countries. Educational Studied in Mathemetics, 20, 225-232.
Harris, A., & Carlton, S. (1993). Patterns of gender differences on mathematics items on the scholastic aptitude test. Applied Measurement in Education, 6, 137-151.
Hedges, L. V., & Friedman, L. (1993). Gender differences in variability in intellectual abilities: A reanalysis of Feingold''s results. Review of Educational Research, 63, 94-105.
Hidalgo, M. D., & López-Piza, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64, 903-915.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun(Eds.), Test validity (pp.129-145). Hillsdale, NJ: Lawrence Erlbaum.
Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: a meta-analysis. Psychological Bulletin, 107(2), 139-155.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349.
Kamata, A. (2001). Item analysis by hierarchical generalized linear model. Journal of Educational Measurement, 38(1). 79-93.
Kamata, A. (2002). Procedure to perform item response analysis by Hierarchical generalized linear model. Paper presented at the annual meeting of American Education Research Association, New Orleans.
Kamata, A., Genc, E., & Bilir, K. (2005). Random-effect differential item functioning across group unites by the hierarchical generalized linear model. Paper presented at the annual meeting of American Education Research Association, Montreal, Canada.
Kamata, A., & Vaughn, B. K. (2004). An introduction to differential item functioning analysis. Learning Disabilities: A Contemporary Journal, 2, 49-69.
Kim, W. (2003). Development of a differential item functioning (DIF) procedure using the hierarchical generalized linear model: A comparison study with logistic regression procedure. Unpublished doctoral dissertation, University of Pennsylvania.
Lane, S., Wang, N., & Magone, M. (1996). Gender-ralated differential item functioning on a middle-school mathematics performance assessment. Educational Measurement: Issues and Practice, 15(4), 21-27.
Langenfeld, T. E. (1997). Test fairness: Internal and external investigations of gender bias in mathematics testing. Educational Measurement: Issues and Practice, 16(1), 20-26.
Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677.
Luppescu, S. (2002, Aprial). DIF detection in HLM. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. ED 479334.
McAllister, P. H. (1993). Testing, DIF, and public policy. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 389–396). Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118.
Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19, 289-304.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement. Washington, DC: American Council on Education.
Miller, D. M., & Oshima, T. C. (1992). Effect of sample size, number of biased items, and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement, 16, 381-388.
Monahan, P. O., & Ankenmann, R. D. (2005). Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel Chi-square test for differential item functioning. Journal of Educational Measurement, 42(2), 101-131.
Monahan, P. O., McHorney, C. A., Stump, T. E., & Perkins, A. J. (2007). Odds ratio. Delta, ETS classification, and standardization measure of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32(1), 92-109.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measutement, 18, 315-328.
Narayanan, P., & Swaminathan, H. (1996). Indentification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274.
O''Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning(pp. 255-276). Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Pommerich, M., Spray, J. A., & Parshall, C. G. (1995). An analytical evaluation of two common-odds ratio as population indicators of DIF. ACT Research Report Series 95-1. Iowa City, IA: ACT.
Potenza, M. T., & Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrica, 53, 495-502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned area between two item response functions. Applied Psychological Measurement, 14, 197-207.
Raudenbush, S. W., & Bryk, A. S., (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed). Newbury Park, CA: Sage.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studies item parameters on SIBTEST and Mantel-Haenszel Type I error perfermence. Journal of Educational Measurement, 33, 215-230.
Ryan, K. E., & Chiu, S. (2001). An examination of item context effect, DIF, and gender DIF. Applied Measurement in Education, 14, 73-90.
Ryan, K. E., & Fan, M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational Measurement : Issues and Practice, 15(4), 15-20.
Scheunemann, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16, 143-152.
Schrader, S. V., & Asley, T. (2006). Sex differences in the Tendency to omit items on multiple-choice test: 1980-2000. Applied Measurement in Education, 19(1), 41-65.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects that bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
Shen, L. (1999). A multilevel assessment of differential item functioning. Paper present at the annual meeting of American Educational Research Association, Montreal, Canada.
Shepard, L. A., Camilli, G., & Averill, M. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Satatistics, 6, 317-375.
Shepard, L. A., Camilli, G., & Williams, D. M. (1984). Accounting for Statistical artifacts in item bias research. Journal of Educational Statistics, 9, 93-128.
Shepard, L. A., Camilli, G., & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educatioanl Measurement, 22, 77-105.
Swafford, J. (1980). Sex differences in firest-year algebra. Journal for Research in Mathematics Education, 11, 335-346.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Swanson. D. B., Clauser, B. E., Case, S. M., Nungester, R. J., & Featherman, C. (2002). Analysis of differential item functioning using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 1, 53-75.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp.147-169). Hillsdale, NJ: Lawrence Erlbaum.
Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18, 15-25.
Wang, W. C., & Su, Y. H. (2004). Effect of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2). 113-144.
Whitmore, M. L., & Schumacker, R. E. (1999). A comparison of logistic regression and analysis of variance differential item functioning detection method. Educational and Psychological Measurement, 59, 910-927.
Zhang, Y., & Zhang, L. (2002). Modeling school and district effect in math achievement of Delaware students measured by DSTP: A preliminary application of hierarchical linear modeling in accountability study. ED 468692.
Zieky , M. ( 1993 ) . Pratical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale, NJ: Lawrence Erlbaum Associate. Inc.
Zieky, M. (2003). DIF primer. Paper presented at the Educational testing Servise.
Zieky, M. (2006). Fairness review in assessment. In S. M. Downing & T.M. Haladyna (Eds.), Handbook of test development (pp. 359-376). Mahwah, NJ: Lawrence Erlbaum Associate.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential items functioning(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type(ordinal) item scores. Ottawa, Canada: Directorate of human resources research and evaluation, Department of National Defense.
Zumbo, B. D. (2001). Investigating DIF by statistical modeling of the probability of endorsing an item: logistic regression and extensions thereof. Paper presented at the National Council for Measurement in Education meeting.
Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Measurement, 16, 143-152.
Zwick, R. & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26, 55-66.