參考文獻
壹、 中文部分
張苙雲(2004)。台灣教育長期追蹤資料庫:第一波(2001)國中學生問卷資料(公共版)(C00124_2)【原始數據】。取自中央研究院人文社會科學研究中心調查研究專題中心學術調查研究資料庫https://srda.sinica.edu.tw。吳芝儀(2011)。以人為主體之社會科學研究倫理議題。人文社會科學研究,5(4),19-39。
余民寧(2011)。教育測驗與評量:成就測驗與教學評量(第三版)。臺北市:心理。
貳、 英文部分
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541-561.
Birnbaum, A. (1968). Some latent trait models and their use in inferring a test-taker’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397– 479). Reading, MA: Addison-Wesley.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348.
Cao, J., & Stokes, S. L. (2008). Bayesian IRT guessing models for partial guessing behaviors. Psychometrika, 73(2), 209-230.
Chib, S., & Greenberg, E. (1995). Understanding the metropolis-hastings algorithm. The American Statistician, 49(4), 327-335.
Cole, J. S., & Osterlind, S. J. (2008). Investigating differences between low- and high-stakes test performance on ageneral education exam. Journal of General Education, 57, 119-130.
Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66(4), 643-656.
Finney, S. J., Sundre, D. L., Swain, M. S., & Williams, L. M. (2016). The validity of value-added estimates from low-stakes testing contexts: The impact of change in test-taking motivation and test consequences. Educational Assessment, 21(1), 60-87.
Fox, J.-P., & Glas, C. A. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66(2), 271-288.
Fox, J.-P., & Glas, C. A. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68(2), 169-191.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence(6), 721-741.
Goegebeur, Y., De Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrika, 73(1), 65-87.
Gulliksen, H. (1950). The reliability of speeded tests. Psychometrika, 15(3), 259-269.
Huffman, L., Adamopoulos, A., Murdock, G., Cole, A., & McDermid, R. (2011). Strategies to motivate students for program assessment. Educational Assessment, 16(2), 90-103.
Huang, H.-Y. (2017). Mixture IRT model with a higher-order structure for latent traits. Educational and Psychological Measurement, 77(2), 275-304.
Huang, H.-Y., & Wang, W.-C. (2013). Higher order testlet response models for hierarchical latent traits and testlet-based items. Educational and Psychological Measurement, 73(3), 491-511.
Jin, K.-Y., & Wang, W.-C. (2013). Generalized IRT models for extreme response style. Educational and Psychological Measurement, 74(1), 116-138.
Jin, K. Y., & Wang, W. C. (2014). Item response theory models for performance decline during testing. Journal of Educational Measurement, 51(2), 178-200.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 11-21.
Lord, F. M. (1953). The relation of test score to the trait underlying the test. Educational and Psychological Measurement, 13(4), 517-549.
Lu, Y., & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29-37.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
Mislevy, R. J. (1995). What can we learn from international assessments? Educational Evaluation and Policy Analysis, 17(4), 419-437.
Mullis, I., Martin, M., and Diaconu, D. (2004). “Item analysis and review,” in TIMSS 2003 Technical Report, eds M. O. Martin, I. V. S. Mullis, and S. J. Chrostowski (Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College), 225–252.
OECD. (2009). PISA 2006 Technical Report. Technical report, PISA, OECD Publishing.
OECD. (2012). Pisa 2009 Technical Report. Technical report, PISA, OECD Publishing.
Oshima, T. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31(3), 200-219.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Pintrich, P. R. (1999). The role of motivation in promoting and sustaining self-regulated learning. International Journal of Educational Research, 31(6), 459-470.
Pintrich, P. R., Smith, D. A., Garcia, T., & McKeachie, W. J. (1993). Reliability and predictive validity of the Motivated Strategies for Learning Questionnaire (MSLQ). Educational and Psychological Measurement, 53(3), 801-813.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Institute of Educational Research. (Expanded edition, 1980. Chicago, IL: The University of Chicago Press.
Schiel, J. (1996). Student effort and performance on a measure of postsecondary educational development (ACT ReportNo. 96–9). IowaCity, IA: American College Testing Program.
Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 6(2), 461-464.
Sessoms, J., & Finney, S. J. (2015). Measuring and modeling change in examinee effort on low-stakes tests across testing occasions. International Journal of Testing, 15(4), 356-388.
Silm, G., Must, O., & Täht, K. (2013). Test-taking effort as a predictor of performance in low-stakes tests. Trames: A Journal of the Humanities & Social Sciences, 17(4), 433–448.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298-321.
Sundre, D. L. (1999, April). Does examinee motivation moderate the relationship between test consequences and test performance? Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
Sundre, D. L., & Kitsantas, A. (2004). An exploration of the psychology of the examinee: Can examinee self-regulation and test-taking motivation predict consequential and non-consequential test performance? Contemporary Educational Psychology, 29(1), 6-26.
Sundre, D. L., & Moore, D. L. (2002). The Student Opinion Scale: A measure of examinee motivation. Assessment Update, 14(1), 8-9.
Sundre, D. L., & Wise, S. L. (2003 ,April). Motivation filtering: An exploration of the impact of low examinee motivation on the psychometric quality of tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Swerdzewski, P. J., Harmes, J. C., & Finney, S. J. (2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education, 24(2), 162-188.
Thelk, A. D., Sundre, D. L., Horst, S. J., & Finney, S. J. (2009). Motivation matters: Using the student opinion scale to make valid inferences about student performance. The Journal of General Education, 129-151.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.
Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education, 19(2), 95-114.
Wise, S. L., Bhola, D. S., & Yang, S. T. (2006). Taking the time to improve the validity of low‐stakes tests: The effort‐monitoring CBT. Educational Measurement: Issues and Practice, 25(2), 21-30.
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1-17.
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort‐moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.
Wise, S. L., & Kingsbury, G. G. (2016). Modeling student test‐taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53(1), 86-105.
Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163-183.
Wise, S. L., & Ma, L. (2012,April). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185-205.
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation, anxiety, and test performance. Applied Measurement in Education, 8(3), 227-242.
Wolf, L. F., Smith, J. K., & Birnbaum, M. E. (1995). Consequence of performance, test, motivation, and mentally taxing items. Applied Measurement in Education, 8(4), 341-351.
Yamamoto, K. (1995). Estimating the effects of test length and test time on parameter estimation using the HYBRID model. ETS Research Report Series, 1995(1), 1-39.
Ziegler, M., MacCann, C., and Roberts, R. D. (2011). Faking in personality assessments: Where do we stand? In Ziegler, M., MacCann, C., and Roberts, R.D. (Eds.), New Perspectives on Faking in Personality Assessment (pp.330-344). New York: Oxford University Press.