中文部分
王寶墉(1995):現代測驗理論。臺北市:心理出版社。
余民寧(1992a):試題反應理論的介紹(一)–測驗理論的發展趨勢。研習資訊8(6),13-18。余民寧(1992b):試題反應理論的介紹(二)–測驗理論的發展趨勢。研習資訊9(1),5-9。余民寧(1992c):試題反應理論的介紹(三)-試題反應模式及其特性。研習資訊9(2),6-10。余民寧(1993a):試題反應理論的介紹(九)-測驗分數的等化(上)。研習資訊10(2),6-11。余民寧(1993b):試題反應理論的介紹(十)-測驗分數的等化(下)。研習資訊10(3),11-16。黃志傑(1994)。定錨試題分佈對測驗等化之影響。國立台中師範學院教育測驗統計研究所碩士論文,未出版。英文部分
Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 69-81.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Baker, F. B. (1992). Equating tests under the graded response model. Applied Psychological Measurement, 16, 87-96.
Baker, F. B. (1993). EQUATE 2.0: A computer program for the characteristic curve method of IRT equating. Applied Psychological Measurement, 17, 20.
Baker, F. B. (1997). Emprirical sampling distributions of equating coefficients for graded and nominal response instruments. Applied Psychological Measurement, 21, 157-172.
Baker, F. B., & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28, 147-162.
Cohen, A. S., & Kim, S. H. (1998). An investigation of linking methods under the graded response model. Applied Psychological Measurement, 22(2), 116-130.
Hanson, B. A. &. Béguin, A. A. (2002). Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent Estimation in the Common-Item Equating Design. Applied Psychological Measurement, 26(1), 3-24.
Kim, S. H., & Cohen, A. S. (1995). A minimum method for equating tests under the graded response model. Applied Psychological Measurement, 19, 167-176.
Kim, S. H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22(2), 131-143.
Kim, S. H., & Cohen, A. S. (2002). A comparison of linking and concurrent calibration under the graded response model. Applied Psychological Measurement, 26(1), 25-40.
Kolen, M. J. & Brennan, R. L. (1995). Test equating: methods and practices. New York: Springer-Verlag.
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer-Verlag.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 169-194.
Linn, R. L., Levine, M. V., Hastings, C. N., & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5, 159-173.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47,149-174.
Reise, S.P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Stocking, M. L.,& Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
Samejima, F. (1969). Estimation of a latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 17.
Samejima, F. (1972). A general model for free response data. Psychometrika Monograph Supplement, 18.
Thissen, D. (1991). MULTILOG user’s guide: Multiple, categorical item analysis and test scoring using item response theory [Computer program]. Chicago: Scientific Software International.
Vale, C. D. (1986). Linking item parameters onto a common scale. Applied Psychological Measurement, 10, 333-344.