(44.192.112.123) 您好!臺灣時間:2021/03/07 17:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:丁姿云
研究生(外文):Ting, Tzu-Yun,
論文名稱:在試題反應理論下的表現衰退模式―以多點計分為例
論文名稱(外文):Polytomous Item Response Theory Models for Performance Decline
指導教授:黃宏宇黃宏宇引用關係
指導教授(外文):Huang, Hung-Yu,
口試日期:2018-01-15
學位類別:碩士
校院名稱:臺北市立大學
系所名稱:心理與諮商學系
學門:社會及行為科學學門
學類:心理學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:81
中文關鍵詞:混合試題反應模式表現衰退模式多點計分模式閱讀成就測驗
外文關鍵詞:mixture item response modelperformance decline modelpolytomous item response theory modelsProgress in International Reading Literacy Study(2011)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:167
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:17
  • 收藏至我的研究室書目清單書目收藏:0
當考生在進行低風險測驗時,很可能因動機不足導致表現衰退,意即整份測驗並非盡全力答題,也代表測驗結果將令人質疑。因此,本研究為了提升測驗有效性,延伸試題反應理論的表現衰退模式,提出表現逐步衰退評定量表模式(RSM-PD)、表現逐步衰退部分計分模式(PCM-PD)、表現逐步衰退廣義評定量表模式(GRSM-PD),及表現逐步衰退廣義部分計分模式(GPCM-PD)。四個主要模式將透過模擬操弄:試題長度、受試者人數、反應量尺、完全努力率與衰退參數,以檢驗模式參數回復效果,並採用PIRLS(2011)台灣樣本作為實徵研究之數據來源,以驗證模式在真實情境下的應用情形。本研究將透過貝氏估計法進行模擬及實徵研究之參數估計。
模擬研究結果顯示,整體而言各參數回復性表現良好,且受到考生人數及試題數量影響較大,當人數及題數提升時,參數回復性愈佳;在模式間比較中,可發現愈趨於簡易之模式,參數估計愈精準;而在錯誤的模式使用下,產生的偏誤值明顯提升;實徵研究顯示,在給予相同條件下,表現衰退模式皆較無衰退模式有較佳之適配度,且各參數回復性表現良好,顯示具精準估計。最後,依據研究結果提出相關建議,以提供未來研究之參考。
When low-stakes tests are administered to examinees, examinees are likely to lose their interest on test items and performance decline will be expected. In this study, the author develops item response theory (IRT) models for examinees’ performance decline and inferences on test takers’ performance could be validated. The efficiency of the four novel IRT models, including the rating scale model with gradual performance decline (RSM-PD), the partial credit model with gradual performance decline (PCM-PD), the generalized rating scale model with gradual performance decline (GRSM-PD), and the generalized partial credit model with gradual performance decline (GPCM-PD), were examined via simulation studies by manipulating several independent factors. Bayesian methods were used to calibrate simulated and empirical data.
The results show that the model parameters can be recovered satisfactorily and as test length and sample size increase the quality of parameter recovery increases. When using the misleading model that ignores the possible performance decline to fit data results in biased parameter estimation. The Progress in International Reading Literacy Study (2011) data form Taiwan was fit to the developed models to serve an empirical example. Finally, several suggestions were provided by the author for future study.
摘要.....i
Abstract.....ii
目次.....iii
表次.....v
圖次.....vi
第一章、緒論.....1
第一節、研究動機.....1
第二節、研究目的.....3
第三節、研究問題.....3
第四節、名詞釋義.....3
第二章、文獻探討.....7
第一節、測驗中的表現衰退.....7
第二節、測驗中的效度議題.....9
第三節、試題反應理論之應用.....11
第四節、試題反應理論衰退模式之應用.....15
第三章、 研究方法.....23
第一節、模式發展.....23
第二節、研究架構.....25
第三節、研究設計.....27
第四節、資料分析.....28
第四章、 結果與討論.....33
第一節、模式收斂效果檢核.....33
第二節、模擬研究結果.....35
第三節、模擬模式比較.....57
第四節、實徵研究結果.....62
第五章、 結論與建議.....69
第一節、研究結論.....69
第一節、研究限制與建議.....72
參考文獻.....75
王文中(1995)。幾個有關Rasch測量模式的爭議。教育與心理研究,19,1-26。
王文中(2004)。Rasch測量理論與其在教育和心理之應用。教育與心理研究,27(4),637-694。
柯華葳、詹益綾、丘嘉慧(2013)。PIRLS 2011報告:臺灣四年級學生閱讀素養。 國家科學委員會科學教育發展處、教育部國民及學前教育署、國立中央大學學習與教學研究所。取自http://www.dorise.info/DER/image_pirls/
country_icon_2006/pirls_2011%E5%A0%B1%E5%91%8A.pdf

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Adam, R. J., Wilson, M., & Wang, W. C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement,
21(1), 1-12.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F. M. & Novick, M. R. (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item
parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under
conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39, 331-348.
Bolt, D. M., Mroch, A. A., & Kim, J.-S. (2003). An empirical investigation of the Hybrid IRT model for improving item parameter estimation in speeded tests. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Cho, S.-J., & Cohen, A. S. (2010). A multilevel mixture IRT model with an application to DIF. J. Educ. Behav. Statist. 35, 336–370.
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. American Statistician, 49, 327-335.
Cao, J., & Stokes, S. (2008). Bayesian IRT guessing models for partial guessing behaviors. Psychometrika, 73, 209–230.
Chang, Y. W., Tsai, R. C., & Hsu, N. J. (2014). A speeded item response model: Leave the harder till later. Psychometrika, 79(2), 255–274.
Cronbach, L. J., & Warrington, W. G. (1951).Timelimittests: Estimatingtheirreliability and degree of speeding. Psychometrika, 14,167–188.
DeMars, C. E. (2000). Test stakes and item format interactions. Applied Measurement in Education, 13, 55–77.
Douglas, J., Kim, H. R., Habing, B., & Gao, F. (1998). Investigating local dependence
with conditional covariance functions. Journal of Educational & Behavioral Statistics, 23,129–151.
Evans, F. R., & Reilly, R. R. (1972). A study of speededness as a source of test bias. Journal of Educational Measurement,9, 123–131.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53, 109–132.
Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley and Sons.
Goegebeur, Y., De Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrika, 73, 65–87.


Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transcationa on Pattern Analysis and Machine Intelligence, 6, 721-741
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, New South Wales, Australia: The University of New England, Center for Behavioral Studies.
Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high- stakes testing. Educational Measurement: Issues and Practice, 23(1), 17–27.
Huang, H.-Y. (2016). Mixture random-effect IRT models for controlling extreme response style on rating scales. Frontiers in Psychology, 7, 1706.
Huang, H.-Y. (2017). Mixture IRT model with a higher-order structure for latent
traits. Educational and Psychological Measurement. [Epub ahead of print].
Huang, H.-Y., Chen, P.-H., & Wang, W.-C. (2012). Computerized adaptive testing using a class of high-order item response theory models. Applied Psychological Measurement, 36, 689-706.
Huang, H.-Y., & Wang, W.-C. (2014). Multilevel higher-order item response theory models. Educational and Psychological Measurement, 74, 495-515.
Jin, K.-Y., and Wang, W.-C. (2014a). Generalized IRT models for extreme response
style. Educational and Psychological Measurement. 74, 116–138.
Jin, K.-Y., and Wang, W.-C. (2014b). Item response theory models for performance decline during testing. Journal of Educational Measurement. 51, 178–200.
Kong, X., Wise, S., & Bhola, D. (2007). Setting the response time parameter to differentiate solution behaviour from rapid-guessing behaviour. Educational and Psychological Measurement, 67(4), 606-619.

Lord, F. M. (1952). A theory of test scores. Psychometrika Monograph. No. 7, 17.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Lu, Y. & Sireci, S. G. (2007). Validity issues in test speededness. Journal of Educational Measurement: Issues and Practice, 26, 29-37.
Muraki, E. (1992). A generalized partial credit model: Application of an EM- algorithm. Applied Psychological Measurement, 16, 159-176.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-104). NewYork: American Council on Education and Macmillan.
Mroch, A. A., Bolt, D. M., & Wollack, J. A. (2005). A new multi-class mixture Rasch model for test speededness. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada. Retrieved May 5, 2015, from https://testing.wisc.edu/research%20papers/ NCME%202005%20paper%20(Mroch,%20Bolt,%20&%20Wollack).pdf
Murphy, K. R., & Davidshofer, C. O. (2005). Psychological Testing: Principles and Applications(6th Edition). Upper Saddle River, New Jersey: Prentice-Hall.
Martin, M.O., Mullis, I.V.S., & Hooper, M. (Eds.). (2016). Methods and Procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Retrieved from http://timssandpirls.bc.edu/publications/timss
/2015-methods.html
McKinley, R. L., &; Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389-390.

Oshima, T. C. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31, 200-219.
Pintrich, P. R., & Schunk, D. H. (2002). Motivation in education: Theory, research, and applications (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research. (Expanded edition, 1980. Chi-cago: The University of Chicago Press.)
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 3, 271-282.
Sympson, J. B. (1978). A model for testing with the multidimensional items. In D. J. Weiss (Ed.), Item response theory and computerized adaptive testing conference proceedings. MN: University of Minnesota press.
Stafford, R. E. (1971). The speededness quotient: A new descriptive statistic for tests.
Journal of Educational Measurement, 8, 275-278.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B,64, 583-639.
Swerdzewski, P. J., Harmes, J. C., & Finney, S. J. (2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education, 24, 162-188.
Schnipke, D. L., & Scrams, D. J. (1997). Modeling item response times with a two-state mixture model: A new method of measuring speededness. Journal of Educational Measurement, 34, 213-232.
Van Barneveld, C. (2007). The effect of test-taker motivation on test construction
within an IRT framework. Applied Psychological Measurement, 31, 31–46.

Wang, A. (2011). A mixture cross-classification IRT model for test speededness. Unpublished doctoral dissertation, University of Georgia.
Wollack, J. A., & Cohen, A. S. (2004). A model for simulating speeded test data. Paper presented at the meeting of the American Educational Research Association. San Diego,CA.
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1–17.
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort­moderated IRT model. Journal of Educational Measurement, 43, 19­38.
Wise, S. L., & Kong, X. (2005). Response time effort: A new measurement of examinee motivation in computer-based tests. Applied Measurement in Education, 18, 163–183.
Wang, W.-C., & Qiu, X.-L. (2013). A multidimensional and multilevel extension of a random-effect approach to subjective judgment in rating scales. Multivariate Behavioral Research,48, 398-427
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation,
anxiety, and test performance. Applied Measurement in Education, 8, 227–242.
Wolf, L. F., Smith, J. K., & Birnbaum, M. E. (1995). Consequence of performance, test, motivation, and mentally taxing items. Applied Measurement in Education, 8, 341–351.
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation,
anxiety, and test performance. Applied Measurement in Education, 8, 227–242.
Weiss, D. J., & Yoes, M. E. (1991). Item response theory. In R. K. Hambleton and J. Zaal(eds.), Advances in educational and psychological testing. Boston: Kluwer Academic Publishers.

Wise, V. L., Wise, S. L., & Bhola, D. S. (2006). The generalizability of motivation
filtering in improving test score validity. Educational Assessment, 11, 65­83.
Wang, W.-C., &; Wu, S.-L. (2011). The random-effect generalized rating scale model. Journal of Educational Measurement, 48, 441-456.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.
Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89–98). M¨unster, Germany: Waxmann.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔