中文部分
王敏嫻(2011)。不同水平等化設計於可能值方法之探討。未出版之碩士論文,臺中教育大學教育測驗統計研究所,臺中市。王暄博(2006)。BIB與NEAT設計之水平及垂直等化效果比較。未出版之碩士論文,臺中教育大學教育測驗統計研究所,臺中市。余民寧(2009),試題反應理論(IRT)及其應用(一版)。臺北市,心理出版社股份有限公司。
洪碧霞、林素微、林娟如(2006)。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊,2(4),69-86。郭伯臣、曾建銘、吳慧珉主編(2012)。大型標準化測驗建置流程應用於TASA之研究。新北市:國家教育研究院。
曾玉琳、王暄博、郭伯臣、許天維(2006)。不同BIB設計對測驗等化的影響。測驗統計年刊,13(2),209-229。臺中市:國立臺中教育大學。英文部分
Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.
Allen, N. L., Carlson J. E., Johnson E. G. ,& Mislevy, R. J. (1999) The NAEP 1998 technical report. Educational Testing Service.
Allen, N. L., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report. Washington, DC: National Center for Educational Statistics.
Andrew, R. W. & Terry, L. S., (2001). The NAEP 1998 Technical Report (NCES 2001-509). National Assessment Governing Board, U.S. Department of Education.
Baker, F. B., & Kim, S. H. (2004). Item Response Theory : Parameter Estimation Techniques. Basel, N. Y. : Marcel Dekker, Inc.
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.
Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. New York: Chapman & Hall. (Distributed by Halsted Press, New York)
Dorans, N. J. & Holland, P. W. (2000). Linking Scores from Multiple Instruments.
Foy, P., Galia, J., & Li, L. (2008). Scaling the data from the TIMSS 2007 Mathematics and Science assessments. In John F. Olson,Michael O. Martin ,Ina V.S. Mullis. (Eds). TIMSS 2007 Technical Report.TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Glas, C. A. W., & Geerlings, H. (2009). Psychometric aspects of pupil monitoring systems. Studies in Educational Evaluation, 35, 83–88.
Graham J. R., Christine, Y. O’S., Alka, A., & Ebru, E. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with non-random groups. Journal of Educational Measurement, 22, 197-206.
Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.
Lee, J., Grigg, W., & Dion, G. (2007). The Nation’s Report Card: Mathematics 2007. National Center for Education Statistics, Institute of Education Sciences, U. S. Department of Education, Washington, D. C.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233-245
Lord, F.M. (1984). Maximum likelihood and Bayesian parameter estimation in item response theory (Research Report No. RR-84-30-ONR). Princeton, NJ: Educational Testing Service.
Martin, M. O., Mullis, I. V. S., & Kennedy, A. M. (2007). PIRLS 2006 Technical Report. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Mislevy, R. J. (1991). Randomization-based inference about latent variable from complex samples. Psychometrika, 56(2), 177-196.
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.
Mislevy, R. J., & Sheehan, K. M. (1989). Information matrices in latent-variable models. Journal of Educational Statistics, 14, 335-350.
Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics form sparse matrix samples of item response. Journal of Educational Measurement, 29, 133-161.
Mullis, I. V. S., Martin, M. O., & Foy, P. (with Olson, J. F., Preuschoff, C., Erberber, E., Arora, A., & Galia, J. ) . (2008). TIMSS 2007 International Mathematics Report. Finding from IEA’s Trends in International Mathematics and Science Study at the Fourth and Eighth Grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Nancy, L. A., James, E. C., & John, R. D. (2001). The NAEP 1998 Technical Report (NCES 2001-509). National Assessment Governing Board, U.S. Department of Education.
Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and Combinatorial Optimization. New York: John Wiley.
OECD (2005). PISA 2003 Technical Report. OCED, Paris.
OECD (2009). PISA 2006 Technical Report. OCED, Paris.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1993). Scaling, Norming, and Equating. In R.L. Linn (Ed.), Educational Measurement (3rd ed., pp221-262). New York: Macmillan.
Rasch, G. (1960). Probabilistic models for some Intelligence and attainment tests. Chicago: University of Chicago Press.
Tianyou, W. (2005). An Alternative Continuization Method to the Kernel Method in von Davier, Holland and Thayer's (2004) Test Equating Framework.
van der Linden, W. J., Veldkamp, B. P., & Carlson, J. E. (2004).Optimizing Balanced Incomplete Block Designs for Educational Assessments. Applied Psychological Measurement, 28, 317-331.
von Davier M., Gonzalez, E., & Mislevy, R. J. (2009).What are plausible values and why are they useful? IERA Monograph Series:Issues and Methodologies in Large-Scale Assessment,2,.9-36.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
Warm, T.A. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika , 54 , 427–450
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31 (2-3), 114-128.
Yates, F. (1936). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci. 26, 424-455