跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.88) 您好!臺灣時間:2026/02/16 01:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林明廣
研究生(外文):Lin, Ming-Guang
論文名稱:二元計分試題反應理論模式在測驗表現衰退上的發展與應用
論文名稱(外文):Development and Application of Binary Item Response Theory Models for Performance Decline
指導教授:黃宏宇黃宏宇引用關係
指導教授(外文):Huang, Hung-Yu
口試日期:2019-06-26
學位類別:碩士
校院名稱:臺北市立大學
系所名稱:心理與諮商學系
學門:社會及行為科學學門
學類:心理學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:77
中文關鍵詞:表現衰退模式二元計分試題反應理論台灣教育長期追蹤資料庫基礎教育成就測驗
外文關鍵詞:performance declinebinary item response theory modelsTaiwan Education Panel SurveyCito-toets
相關次數:
  • 被引用被引用:0
  • 點閱點閱:177
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
中文摘要
本研究旨在發展「二元計分表現衰退試題反應理論模式」,此模式用於檢驗受測者之在二元計分試題上之表現衰退的行為,並檢驗其有效性。本論文研究共分為兩部分,分別為模擬研究與實徵研究。在模擬研究部分,透過操弄「試題長度」、「受測者人數」、「完全努力率」、「衰退參數」等四個自變項,以此產生模擬資料,進而檢驗模式在試題參數、完全努力率、衰退參數、受測者能力參數上的估計效果;在實徵研究部分,採用「台灣教育長期追蹤資料庫」( Taiwan Education Panel Survey,TEPS),2001年高中職五專學生表現評量與Fox和Glas(2001)所使用的資料,以荷蘭小學生為施測對象的「基礎教育成就測驗」(Cito-toets),分別作為低風險測驗與高風險測驗之實徵資料來源,以檢驗模式在實務上的適配度。
  研究結果顯示:(1)模式對於試題參數能夠提供良好的估計效果;(2)模式對於受測者能力參數能夠提供良好的估計效果;(3)模式在實務上,能夠提供良好的適配度。最後,針對未來研究與實務應用上提供建議,期許可以提升測量實務上的檢驗與提供後續研究者的參考。
The purpose of this study is to develop binary item response theory models for performance decline, which can be used to detect test takers’ performance decline on dichotomous scoring items. This study is comprised of two parts, including simulation and empirical studies. In the simulation study, four factor of sample size, test length, full effort proportion, and decrement rate are manipulated. The efficiency of model parameter estimation was evaluated by simulated data analysis. Two empirical examples of Taiwan Education Panel Survey (2001) and Dutch Primary School Leaving Test (Cito-toets) were demonstrated for the application of the new models.
The results show that this approach provided good parameter estimation for item parameters, ability parameters, and other model parameters. In addition, the new models were successfully applied to empirical data analysis. Finally, the author offers some suggestions for the future studies and practical applications.
目錄
謝辭……………………………………………………………………………………I
中文摘要…………………………………………………………………………II
Abstract………………………………………………………………………III
目錄……………………………………………………………………………………IV
表目錄………………………………………………………………………………VI
圖目錄……………………………………………………………………………VIII
第一章、緒論…………………………………………………………………………1
第一節、 研究動機……………………………………………………1
第二節、 研究目的……………………………………………………3
第三節、 研究問題……………………………………………………3
第四節、 名詞釋義……………………………………………………4
第二章、文獻探討………………………………………………………………7
第一節、 表現衰退的意涵………………………………………7
第二節、 試題反應理論…………………………………………12
第三節、 表現衰退模式…………………………………………14
第三章、研究方法………………………………………………………………21
第一節、 模式發展……………………………………………………21
第二節、 研究架構……………………………………………………24
第三節、 研究設計……………………………………………………26
第四節、 資料分析……………………………………………………28
第四章、研究結果………………………………………………………………31
第一節、 模式收斂效果檢核……………………………………31
第二節、 模擬研究結果……………………………………………33
第三節、 模擬模式比較……………………………………………47
第四節、 實徵研究結果……………………………………………57
第五章、結論與建議………………………………………………………………65
第一節、 研究結論…………………………………………………………65
第二節、 研究貢獻…………………………………………………………68
第三節、 研究倫理議題…………………………………………………69
第四節、 研究限制與建議……………………………………………71

參考文獻…………………………………………………………………………………………73
參考文獻
壹、 中文部分
張苙雲(2004)。台灣教育長期追蹤資料庫:第一波(2001)國中學生問卷資料(公共版)(C00124_2)【原始數據】。取自中央研究院人文社會科學研究中心調查研究專題中心學術調查研究資料庫https://srda.sinica.edu.tw。
吳芝儀(2011)。以人為主體之社會科學研究倫理議題。人文社會科學研究,5(4),19-39。
余民寧(2011)。教育測驗與評量:成就測驗與教學評量(第三版)。臺北市:心理。
貳、 英文部分
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541-561.
Birnbaum, A. (1968). Some latent trait models and their use in inferring a test-taker’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397– 479). Reading, MA: Addison-Wesley.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331-348.
Cao, J., & Stokes, S. L. (2008). Bayesian IRT guessing models for partial guessing behaviors. Psychometrika, 73(2), 209-230.
Chib, S., & Greenberg, E. (1995). Understanding the metropolis-hastings algorithm. The American Statistician, 49(4), 327-335.
Cole, J. S., & Osterlind, S. J. (2008). Investigating differences between low- and high-stakes test performance on ageneral education exam. Journal of General Education, 57, 119-130.
Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66(4), 643-656.
Finney, S. J., Sundre, D. L., Swain, M. S., & Williams, L. M. (2016). The validity of value-added estimates from low-stakes testing contexts: The impact of change in test-taking motivation and test consequences. Educational Assessment, 21(1), 60-87.
Fox, J.-P., & Glas, C. A. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66(2), 271-288.
Fox, J.-P., & Glas, C. A. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68(2), 169-191.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence(6), 721-741.
Goegebeur, Y., De Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrika, 73(1), 65-87.
Gulliksen, H. (1950). The reliability of speeded tests. Psychometrika, 15(3), 259-269.
Huffman, L., Adamopoulos, A., Murdock, G., Cole, A., & McDermid, R. (2011). Strategies to motivate students for program assessment. Educational Assessment, 16(2), 90-103.
Huang, H.-Y. (2017). Mixture IRT model with a higher-order structure for latent traits. Educational and Psychological Measurement, 77(2), 275-304.
Huang, H.-Y., & Wang, W.-C. (2013). Higher order testlet response models for hierarchical latent traits and testlet-based items. Educational and Psychological Measurement, 73(3), 491-511.
Jin, K.-Y., & Wang, W.-C. (2013). Generalized IRT models for extreme response style. Educational and Psychological Measurement, 74(1), 116-138.
Jin, K. Y., & Wang, W. C. (2014). Item response theory models for performance decline during testing. Journal of Educational Measurement, 51(2), 178-200.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 11-21.
Lord, F. M. (1953). The relation of test score to the trait underlying the test. Educational and Psychological Measurement, 13(4), 517-549.
Lu, Y., & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29-37.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
Mislevy, R. J. (1995). What can we learn from international assessments? Educational Evaluation and Policy Analysis, 17(4), 419-437.
Mullis, I., Martin, M., and Diaconu, D. (2004). “Item analysis and review,” in TIMSS 2003 Technical Report, eds M. O. Martin, I. V. S. Mullis, and S. J. Chrostowski (Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College), 225–252.
OECD. (2009). PISA 2006 Technical Report. Technical report, PISA, OECD Publishing.
OECD. (2012). Pisa 2009 Technical Report. Technical report, PISA, OECD Publishing.
Oshima, T. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31(3), 200-219.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Pintrich, P. R. (1999). The role of motivation in promoting and sustaining self-regulated learning. International Journal of Educational Research, 31(6), 459-470.
Pintrich, P. R., Smith, D. A., Garcia, T., & McKeachie, W. J. (1993). Reliability and predictive validity of the Motivated Strategies for Learning Questionnaire (MSLQ). Educational and Psychological Measurement, 53(3), 801-813.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Institute of Educational Research. (Expanded edition, 1980. Chicago, IL: The University of Chicago Press.
Schiel, J. (1996). Student effort and performance on a measure of postsecondary educational development (ACT ReportNo. 96–9). IowaCity, IA: American College Testing Program.
Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 6(2), 461-464.
Sessoms, J., & Finney, S. J. (2015). Measuring and modeling change in examinee effort on low-stakes tests across testing occasions. International Journal of Testing, 15(4), 356-388.
Silm, G., Must, O., & Täht, K. (2013). Test-taking effort as a predictor of performance in low-stakes tests. Trames: A Journal of the Humanities & Social Sciences, 17(4), 433–448.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298-321.
Sundre, D. L. (1999, April). Does examinee motivation moderate the relationship between test consequences and test performance? Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
Sundre, D. L., & Kitsantas, A. (2004). An exploration of the psychology of the examinee: Can examinee self-regulation and test-taking motivation predict consequential and non-consequential test performance? Contemporary Educational Psychology, 29(1), 6-26.
Sundre, D. L., & Moore, D. L. (2002). The Student Opinion Scale: A measure of examinee motivation. Assessment Update, 14(1), 8-9.
Sundre, D. L., & Wise, S. L. (2003 ,April). Motivation filtering: An exploration of the impact of low examinee motivation on the psychometric quality of tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Swerdzewski, P. J., Harmes, J. C., & Finney, S. J. (2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education, 24(2), 162-188.
Thelk, A. D., Sundre, D. L., Horst, S. J., & Finney, S. J. (2009). Motivation matters: Using the student opinion scale to make valid inferences about student performance. The Journal of General Education, 129-151.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.
Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education, 19(2), 95-114.
Wise, S. L., Bhola, D. S., & Yang, S. T. (2006). Taking the time to improve the validity of low‐stakes tests: The effort‐monitoring CBT. Educational Measurement: Issues and Practice, 25(2), 21-30.
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1-17.
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort‐moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.
Wise, S. L., & Kingsbury, G. G. (2016). Modeling student test‐taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53(1), 86-105.
Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163-183.
Wise, S. L., & Ma, L. (2012,April). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185-205.
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation, anxiety, and test performance. Applied Measurement in Education, 8(3), 227-242.
Wolf, L. F., Smith, J. K., & Birnbaum, M. E. (1995). Consequence of performance, test, motivation, and mentally taxing items. Applied Measurement in Education, 8(4), 341-351.
Yamamoto, K. (1995). Estimating the effects of test length and test time on parameter estimation using the HYBRID model. ETS Research Report Series, 1995(1), 1-39.
Ziegler, M., MacCann, C., and Roberts, R. D. (2011). Faking in personality assessments: Where do we stand? In Ziegler, M., MacCann, C., and Roberts, R.D. (Eds.), New Perspectives on Faking in Personality Assessment (pp.330-344). New York: Oxford University Press.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top