中文部分
王寶墉(1995)。現代測驗理論。台北:心理出版社。
余民寧(2002)。教育測驗與評量─成就測驗與教學評量。台北:心理出版社。
林佳樺(2009)。高階層試題反應理論及其成效探討。未出版之碩士論文。國立台中教育大學教育測驗統計研究所碩士論文。林原宏(2006)。數學試題的局部獨立性與題組反應模式:兼論其在數學考卷的評析與檢驗。數學考卷編製暨評析研討會。台中市:國立台中教育大學。
許思雯(2008)。題組測驗在三種IRT計分模式能力估計精確性之比較。未出版之碩士論文。國立臺南大學測驗統計研究所碩士論文。張勝凱(2010)。使用HIRT模式建立國小六年級學童數學推理能力測驗。未出版之碩士論文。國立台中教育大學教育測驗統計研究所碩士論文。郭生玉(1989)。心理與教育測驗。台北:精華書局。
英文部分
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
Allen, S., & Sudweeks, R. R. (2001). Identifying and managing local item Dependence in context-dependent item sets. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters:Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Bradlow, E. T., Wainer H., & Wang X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168.
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289.
Crehan, K. D., Sireci, S. G., Haladyna, T. M., & Henderson, P. A. (1993, April). A comparison of testlet reliability for polytomous scoring methods. Paper presented at the Annual Meeting of the American Educational Research Association, Atlanta, GA.
Cureton, E. E. (1965). Reliability and validity: Basic assumptions and experimental designs. Educational and Psychological Measurement, 25(2), 327-346.
de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353.
de la Torre, J., & Hong, Y. (2010). Parameter Estimation With Small Size A Higher-Order IRT Model Approach. Applied Psychological Measurement, 34(4), 267-285.
de la Torre, J., & Patz, R. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295-311.
de la Torre, J., & Song, H. (2009). Simultaneous Estimation of Overall and Domain Abilities: A Higher-Order IRT Model Approach. Applied Psychological Measurement, 33(8), 620-639.
DeMars, C. E. (2006). Application of the Bi-Factor Multidimensional Item Response Theory Model to Testlet-Based Tests. Journal of Educational Measurement, 43(2), 145-168.
Douglas, J., Kim, H. R. Habing, B., & Gao, F. (1998). Investigating local dependence with conditional functions. Journal of Educational and Behavioral Statistics, 23(2), 129-151.
Ebel, R. L. (1951). Writing the testing item. In E. F. Lindquist(Ed.), Educational Measurement(pp. 185-249). Washington, DC: American Council on Education.
Ferrara S., Huynh, H., & Baghi H. (1997). Contextual characteristics of locally dependent open-ended item clusters in a large-scale performance assessment. Applied Measurement in Education, 10(2), 123-144.
Geman, S., & Geman, D.(1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactiona on Pattern Analysis and Machine Intelligence, 6(6), 721-741.
Gilks, W. R., Richardson, S., & Spielgelhalter, D. J. (Eds.) (1996). Markov chain Monte Carlo in practice(pp. 1-17).. London: Chapman & Hall.
Glas A. W., Wainer H., & Bradlow E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas(Eds.), Computerized adaptive testing: Theory and Practice(pp. 271-287).Dordrecht, Netherlands: Kluwer.
Haladyna, T. M. (1992). Context-Dependent Item Sets. Educational Measurement: Issues and Practice, 11(4), 21-25.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97-109.
Hoffman, B. C. (1962). The tyranny of testing. New York: Crowell-Collier.
Keller, L. A., Swaminathan, H., & Sireci, S. G. (2003). Evaluating scoring procedures for context-dependent item sets. Applied Measurement in Education, 16(3), 207-222.
Lee, G., Brennan, R. L., & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Practice, 19(4), 9-15.
Lee, Y. W. (2004). Examining passage-related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test. Language Testing, 21(1), 75-100.
Lord, F. M. (1980). Applications of item response theory tp practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associated.
Mislevy, R. J. (1986). Bayes model estimation in item response models. Psychometrika, 51(2), 177-195.
Mullis I. V. S., Michael O. Martin, Graham J. Ruddock, Christine Y. O’Sullivan, & Corinna Preuschoff. (2009). TIMSS 2011 Assessment Frameworks. Chestnut Hill, MA: Boston College.
Mullis, I. V. S., Martin, M. O., Kennedy, A. M., & Pierre, F. (2007). PIRLS 2006 International Report: IEA’s Progress in International Reading Literacy Study in Primary Schools in 40 Countries. TIMSS & PIRLS, International Study Center, Chestnut Hill, MA: Boston College.
OECD (2005). PISA 2003 Technical Report. OCED. Paris.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Rasch, G. (1960). Probabilistic models for some Intelligence and attainment tests. Chicago: University of Chicago Press.
Reese, L. M. (1995). The impact of local dependencies on some LSAT outcomes. Paper presented at the Law School Admission Council, Newtown, PA.
Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53(3), 349-359.
Sheng, Y. (2007). Comparing multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6), 899-919.
Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS Manual Version 1.4. (MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK, http://www.mrc-bsu.cam.ac.uk/bugs)
Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the three parameter logistic models. Psychometrika, 51(4), 589-601.
Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TEOFL IRT-based equating (TOEFL Technical Report TR-7). Princeton, NJ: Educational Testing Service.
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26(3), 247-260.
Tierney, L.(1994). Markov Chains for exploring posterior distributions. The Annals of Statistics, 22(2), 1701-1762.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2),157-186.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas(Eds.), Computerized adaptive testing: Theory and Practice(pp. 245-269).Dordrecht, Netherlands: Kluwer.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New Yorks Cambridge University Press.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201.
Wainer, H., & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27(1), 1-14.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57(5), 741-758.
Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning: Definition and detecting. Journal of Educational Measurement, 28(3), 197-219.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issue and Practice, 15(1), 22-29.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.
Wainer, H., Vevea, J. L., Camacho, F., Reeve, B., Rosa, K., Nelson, L., Swygert, K. A., & Thissen, D. (2001). Augmented scores: ‘‘Borrowing strength’’ to compute scores based on small number of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343-387). Mahwah, NJ: Lawrence Erlbaum.
Wang, X., Bradlow, E. T., & Wainer, H. (2005). User’s guide for SCORIGHT(verson 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis(ETS Technical Report RR-04-49). Princeton, NJ: Educational Testing Service.
Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126-149.
Wesman, A. G. (1971). Writing the test item. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 81-129). Washington, DC: American Council on Education.
Wilson, M., & Adams, R. J. (1995). Rasch models for item bundles. Psychometrika, 60(2), 181-198.
Xie, Y. (2001). Dimensionality, Dependence, or both? An Application of the Item Bundle Models to Multidimensional Data. Paper presented at the Annual Meeting of the American Eudcational Research Association, Seattle, WA.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145.
Yen, W. M. (1987). A Bayesian / IRT index of objective performance. Paper presented at the annual meeting of the Psychometric Society, Montreal, Quebec, Canada, June 1-19.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187-213.
Zeniskey, A. L., Hambleton, R. K., & Sireci, S. G. (1999). Effects of item local dependence on the validity of IRT item, test, and ability statistics. Washington, DC: Association of American Medical Colleges.
Zeniskey, A. L., Hambleton, R. K., & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the medical college admission test. Journal of Educational Measurement, 39(4), 291-309.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOGMG. Scientific Software lnternational