(3.238.96.184) 您好!臺灣時間:2021/05/08 04:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:吳艷鴽
研究生(外文):Wu,Yen-Ju
論文名稱:階層式題組反應理論模式之探討
論文名稱(外文):An exploration for higher-order item response theory model
指導教授:郭伯臣郭伯臣引用關係
指導教授(外文):Bor-Chen Kuo
口試委員:林素微王脩斐
口試委員(外文):Lin,Su-WeiWang,Hsiu-Fei
口試日期:2011-07-05
學位類別:碩士
校院名稱:國立臺中教育大學
系所名稱:教育測驗統計研究所
學門:教育學門
學類:教育測驗評量學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:190
中文關鍵詞:題組反應理論模式階層式試題反應理論階層式題組反應理論
外文關鍵詞:Testlet response theory modelHigher-order item response theoryHigher-order testlet response theory
相關次數:
  • 被引用被引用:0
  • 點閱點閱:161
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
階層式的評量架構已是現今許多大型標準化測驗之趨勢,以適當的方式估計,可以同時獲得總體量尺(overall ability)以及領域量尺(domain ability)估計值,且估計精準度較高。目前階層式試題理論反應(Higher-order item response theory, HO-IRT)模式的主要分析對象多為獨立試題,而愈來愈常見於大型測驗的題組試題則較少論及。
本研究透過模擬實驗分析階層式題組評量架構之測驗資料,控制領域量尺數、總題數、題組數、題組長度、題組效果變異數、迴歸參數等變項,探討完整估計方式及分開估計方式於受試者能力參數、試題參數、題組效果、題組效果變異數之估計精準度,據以瞭解不同估計法於階層式題組評量架構之估計成效,期望提供分析階層式題組評量架構之測驗資料的參考。
由模擬實驗結果可知,完整估計的方式可以獲得較完整且精準的估計值;領域量尺數、題數、題組數、迴歸參數的增加,能力參數估計精準度會隨之增加;而題組效果變異數變大,能力參數估計精準度則會隨之降低;試題參數、題組效果及題組效果變異數則無一致性結果。

The hierarchical assessment framework is the current tendency in many large-scale standardized tests, and it can get the overall ability and the domain ability simultaneously and be accurately estimated by the suitable way. For the present, higher-order item response theory model almost mainly analyzes the single independent item, but little concerns about the testlets which are more and more frequently used in large-scale standardized tests.
The purpose of this research is to analyze testlets test data on the base of hierarchical assessment framework by the method of simulated experiment. The goal is to estimate the accuracy on the ability parameters, the item parameters, the testlet effects, and the variances of testlet effect by controlling the factors including the number of the domaim scale, the number of items, the number of testlets, the variances of testlet effects ,the correlations.
According to the results, reference is offered to analyze hierarchical test data. The results are also to show that full estimation can obtain more complete and accurate estimated values. As the number of the domaim scale, the number of items, the number of testlets, the correlations increase, the accuracy of ability estimated values increases. As the variances of testlet effect increase, the accuracy of ability estimated values decreases. However, the results of the item parameters, the testlet effects, and the variances of testlet effect are irregular.

第一章 緒論 1
第一節 研究動機與目的 1
第二節 待答問題 3
第三節 名詞解釋 3
第二章 文獻探討 6
第一節 階層式試題反應理論模式 6
第二節 題組與試題依賴的概念 11
第三節 題組反應理論 21
第四節 參數估計 26
第三章 研究設計與方法 30
第一節 研究流程 30
第二節 研究變項 31
第三節 模擬實驗程序 40
第四節 評估指標 41
第五節 研究工具 43
第四章 研究結果 44
第一節 不同估計方式之各參數估計結果比較 44
第二節 操控變項對參數估計精準度之影響 63
第五章 結論與建議 74
第一節 結論 74
第二節 建議 77
參考文獻 78
中文部分 78
英文部分 78
附錄一 各架構估計結果 85

中文部分
王寶墉(1995)。現代測驗理論。台北:心理出版社。
余民寧(2002)。教育測驗與評量─成就測驗與教學評量。台北:心理出版社。
林佳樺(2009)。高階層試題反應理論及其成效探討。未出版之碩士論文。國立台中教育大學教育測驗統計研究所碩士論文。
林原宏(2006)。數學試題的局部獨立性與題組反應模式:兼論其在數學考卷的評析與檢驗。數學考卷編製暨評析研討會。台中市:國立台中教育大學。
許思雯(2008)。題組測驗在三種IRT計分模式能力估計精確性之比較。未出版之碩士論文。國立臺南大學測驗統計研究所碩士論文。
張勝凱(2010)。使用HIRT模式建立國小六年級學童數學推理能力測驗。未出版之碩士論文。國立台中教育大學教育測驗統計研究所碩士論文。
郭生玉(1989)。心理與教育測驗。台北:精華書局。

英文部分
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
Allen, S., & Sudweeks, R. R. (2001). Identifying and managing local item Dependence in context-dependent item sets. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters:Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Bradlow, E. T., Wainer H., & Wang X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153-168.
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289.
Crehan, K. D., Sireci, S. G., Haladyna, T. M., & Henderson, P. A. (1993, April). A comparison of testlet reliability for polytomous scoring methods. Paper presented at the Annual Meeting of the American Educational Research Association, Atlanta, GA.
Cureton, E. E. (1965). Reliability and validity: Basic assumptions and experimental designs. Educational and Psychological Measurement, 25(2), 327-346.
de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353.
de la Torre, J., & Hong, Y. (2010). Parameter Estimation With Small Size A Higher-Order IRT Model Approach. Applied Psychological Measurement, 34(4), 267-285.
de la Torre, J., & Patz, R. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295-311.
de la Torre, J., & Song, H. (2009). Simultaneous Estimation of Overall and Domain Abilities: A Higher-Order IRT Model Approach. Applied Psychological Measurement, 33(8), 620-639.
DeMars, C. E. (2006). Application of the Bi-Factor Multidimensional Item Response Theory Model to Testlet-Based Tests. Journal of Educational Measurement, 43(2), 145-168.
Douglas, J., Kim, H. R. Habing, B., & Gao, F. (1998). Investigating local dependence with conditional functions. Journal of Educational and Behavioral Statistics, 23(2), 129-151.
Ebel, R. L. (1951). Writing the testing item. In E. F. Lindquist(Ed.), Educational Measurement(pp. 185-249). Washington, DC: American Council on Education.
Ferrara S., Huynh, H., & Baghi H. (1997). Contextual characteristics of locally dependent open-ended item clusters in a large-scale performance assessment. Applied Measurement in Education, 10(2), 123-144.
Geman, S., & Geman, D.(1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactiona on Pattern Analysis and Machine Intelligence, 6(6), 721-741.
Gilks, W. R., Richardson, S., & Spielgelhalter, D. J. (Eds.) (1996). Markov chain Monte Carlo in practice(pp. 1-17).. London: Chapman & Hall.
Glas A. W., Wainer H., & Bradlow E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas(Eds.), Computerized adaptive testing: Theory and Practice(pp. 271-287).Dordrecht, Netherlands: Kluwer.
Haladyna, T. M. (1992). Context-Dependent Item Sets. Educational Measurement: Issues and Practice, 11(4), 21-25.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97-109.
Hoffman, B. C. (1962). The tyranny of testing. New York: Crowell-Collier.
Keller, L. A., Swaminathan, H., & Sireci, S. G. (2003). Evaluating scoring procedures for context-dependent item sets. Applied Measurement in Education, 16(3), 207-222.
Lee, G., Brennan, R. L., & Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Practice, 19(4), 9-15.
Lee, Y. W. (2004). Examining passage-related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test. Language Testing, 21(1), 75-100.
Lord, F. M. (1980). Applications of item response theory tp practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associated.
Mislevy, R. J. (1986). Bayes model estimation in item response models. Psychometrika, 51(2), 177-195.
Mullis I. V. S., Michael O. Martin, Graham J. Ruddock, Christine Y. O’Sullivan, & Corinna Preuschoff. (2009). TIMSS 2011 Assessment Frameworks. Chestnut Hill, MA: Boston College.
Mullis, I. V. S., Martin, M. O., Kennedy, A. M., & Pierre, F. (2007). PIRLS 2006 International Report: IEA’s Progress in International Reading Literacy Study in Primary Schools in 40 Countries. TIMSS & PIRLS, International Study Center, Chestnut Hill, MA: Boston College.
OECD (2005). PISA 2003 Technical Report. OCED. Paris.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Rasch, G. (1960). Probabilistic models for some Intelligence and attainment tests. Chicago: University of Chicago Press.
Reese, L. M. (1995). The impact of local dependencies on some LSAT outcomes. Paper presented at the Law School Admission Council, Newtown, PA.
Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53(3), 349-359.
Sheng, Y. (2007). Comparing multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6), 899-919.
Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS Manual Version 1.4. (MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK, http://www.mrc-bsu.cam.ac.uk/bugs)
Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the three parameter logistic models. Psychometrika, 51(4), 589-601.
Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TEOFL IRT-based equating (TOEFL Technical Report TR-7). Princeton, NJ: Educational Testing Service.
Thissen, D., Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26(3), 247-260.
Tierney, L.(1994). Markov Chains for exploring posterior distributions. The Annals of Statistics, 22(2), 1701-1762.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8(2),157-186.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas(Eds.), Computerized adaptive testing: Theory and Practice(pp. 245-269).Dordrecht, Netherlands: Kluwer.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New Yorks Cambridge University Press.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201.
Wainer, H., & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27(1), 1-14.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57(5), 741-758.
Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning: Definition and detecting. Journal of Educational Measurement, 28(3), 197-219.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issue and Practice, 15(1), 22-29.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203-220.
Wainer, H., Vevea, J. L., Camacho, F., Reeve, B., Rosa, K., Nelson, L., Swygert, K. A., & Thissen, D. (2001). Augmented scores: ‘‘Borrowing strength’’ to compute scores based on small number of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343-387). Mahwah, NJ: Lawrence Erlbaum.
Wang, X., Bradlow, E. T., & Wainer, H. (2005). User’s guide for SCORIGHT(verson 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis(ETS Technical Report RR-04-49). Princeton, NJ: Educational Testing Service.
Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126-149.
Wesman, A. G. (1971). Writing the test item. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 81-129). Washington, DC: American Council on Education.
Wilson, M., & Adams, R. J. (1995). Rasch models for item bundles. Psychometrika, 60(2), 181-198.
Xie, Y. (2001). Dimensionality, Dependence, or both? An Application of the Item Bundle Models to Multidimensional Data. Paper presented at the Annual Meeting of the American Eudcational Research Association, Seattle, WA.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145.
Yen, W. M. (1987). A Bayesian / IRT index of objective performance. Paper presented at the annual meeting of the Psychometric Society, Montreal, Quebec, Canada, June 1-19.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187-213.
Zeniskey, A. L., Hambleton, R. K., & Sireci, S. G. (1999). Effects of item local dependence on the validity of IRT item, test, and ability statistics. Washington, DC: Association of American Medical Colleges.
Zeniskey, A. L., Hambleton, R. K., & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the medical college admission test. Journal of Educational Measurement, 39(4), 291-309.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOGMG. Scientific Software lnternational
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔