跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/08/01 13:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:葉昶成
研究生(外文):Jeffrey Yeh
論文名稱:不同垂直等化設計下可能值方法估計效果之探討
指導教授:郭伯臣郭伯臣引用關係吳慧珉吳慧珉引用關係
指導教授(外文):Bor-Chen KuoHuey-Min Wu
口試委員:柯華葳廖晨惠楊裕貿
口試委員(外文):Hwawei KoChen Huei LiauYu Mao Yang
口試日期:2012-06-20
學位類別:碩士
校院名稱:國立臺中教育大學
系所名稱:教育測驗統計研究所
學門:教育學門
學類:教育測驗評量學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:94
中文關鍵詞:大型測驗臺灣學生學習成就評量資料庫可能值等化設計垂直等化
外文關鍵詞:large-scale assessmentTaiwan Assessment of Student Achievementplausible valuesequating designvertical equating
相關次數:
  • 被引用被引用:2
  • 點閱點閱:144
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
許多國際上的大型測驗,多採用可能值方法來進行群體能力參數的估計。而可能值的資料型態,亦可讓資料分析者進行統計特性的描述。此外,一般大型測驗所評量的範圍都涵蓋了不同的認知向及難度,無法由單一受試者於短期間內全部完成,測驗題目都會進行不同的等化設計以減輕受試者負擔並達成測驗的目的。

本研究係各以定錨不等組(non-equivalent groups with anchor test design, NEAT)及平衡不完全區塊(balanced incomplete block design, BIB)的垂直等化設計,並以可能值方法、納入背景變項的期望後驗法、期望後驗法及最大概度估計法等各種方法別進行個體能力及群體能力的平均數與標準差的估計,其主要的目的在於探討可能值方法及其它估計法在群體參數回復的效果。

本研究結果發現在各種不同的垂直等化設計下,不管是個體能力參數的估計,或是群體能力平均數與標準差的回復上,納入背景變項估計方法皆有較好的估計效果。尤其在群體能力標準差的回復上,可能值方法的估計結果遠優於各種估計方法。

The purpose of this paper is to explore the performance of plausible values method under BIB and NEAT designs for vertical equating based on simulated data. The major focus of large-scale assessments is always on the population statistics, such as means and standard deviations, and the plausible value method is usually used to estimate the population parameters. For large-scale assessments the spectrum of subject matter is usually wide, but the testing time is short. Therefore, in order to cover the proficiency domain sufficiently, multiple booklets are used. Balanced incomplete block design (BIB) and non-equivalent groups with anchor test design (NEAT) are two popular test equating methods for this condition. The experimental results show that the estimating method based on plausible values estimate better than that of other methods in vertical equating designs, and as the test length increase, population parameters (mean and standard deviation) are well estimated. In these experimental situations, the estimations of population parameters are not affected by sample size (16128 and 10920). Both linking designs, BIB and NEAT, can lead to more precision estimates by using plausible value method.
摘要 I
Abstract II
目錄 III
表目錄 V
圖目錄 VI
第一章 緒論 1
第一節 研究動機 1
第二節 研究目的與待答問題 3
第三節 名詞解釋 3
第二章 文獻探討 7
第一節 單向度試題反應理論 7
第二節 參數估計方法 8
第三節 可能值方法 13
第四節 測驗等化設計 16
第三章 研究方法 21
第一節 研究步驟 21
第二節 測驗等化設計 23
第三節 模擬條件與估計方法設定 25
第四節 研究工具 28
第五節 評估準則 29
第四章 研究結果與討論 31
第一節 NEAT等化設計估計結果 31
第二節 BIB等化設計估計結果 38
第三節 二種等化設計方法之比較 44
第五章 結論與建議 49
第一節 結論 49
第二節 建議 50
參考文獻 53
中文部分 53
英文部分 54
附錄一 NEAT設計個體能力值不同估計方法之RMSE 59
附錄二 BIB設計個體能力值不同估計方法之RMSE 63
附錄三 NEAT設計群體能力平均數不同估計方法之RMSE 66
附錄四 BIB設計群體能力平均數不同估計方法之RMSE 71
附錄五 NEAT設計群體能力標準差不同估計方法之RMSE 74
附錄六 BIB設計群體能力標準差不同估計方法之RMSE 78

中文部分
王敏嫻(2011)。不同水平等化設計於可能值方法之探討。未出版之碩士論文,臺中教育大學教育測驗統計研究所,臺中市。
王暄博(2006)。BIB與NEAT設計之水平及垂直等化效果比較。未出版之碩士論文,臺中教育大學教育測驗統計研究所,臺中市。
余民寧(2009),試題反應理論(IRT)及其應用(一版)。臺北市,心理出版社股份有限公司。
洪碧霞、林素微、林娟如(2006)。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊,2(4),69-86。
郭伯臣、曾建銘、吳慧珉主編(2012)。大型標準化測驗建置流程應用於TASA之研究。新北市:國家教育研究院。
曾玉琳、王暄博、郭伯臣、許天維(2006)。不同BIB設計對測驗等化的影響。測驗統計年刊,13(2),209-229。臺中市:國立臺中教育大學。


英文部分
Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.
Allen, N. L., Carlson J. E., Johnson E. G. ,& Mislevy, R. J. (1999) The NAEP 1998 technical report. Educational Testing Service.
Allen, N. L., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report. Washington, DC: National Center for Educational Statistics.
Andrew, R. W. & Terry, L. S., (2001). The NAEP 1998 Technical Report (NCES 2001-509). National Assessment Governing Board, U.S. Department of Education.
Baker, F. B., & Kim, S. H. (2004). Item Response Theory : Parameter Estimation Techniques. Basel, N. Y. : Marcel Dekker, Inc.
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.
Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. New York: Chapman & Hall. (Distributed by Halsted Press, New York)
Dorans, N. J. & Holland, P. W. (2000). Linking Scores from Multiple Instruments.
Foy, P., Galia, J., & Li, L. (2008). Scaling the data from the TIMSS 2007 Mathematics and Science assessments. In John F. Olson,Michael O. Martin ,Ina V.S. Mullis. (Eds). TIMSS 2007 Technical Report.TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Glas, C. A. W., & Geerlings, H. (2009). Psychometric aspects of pupil monitoring systems. Studies in Educational Evaluation, 35, 83–88.
Graham J. R., Christine, Y. O’S., Alka, A., & Ebru, E. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with non-random groups. Journal of Educational Measurement, 22, 197-206.
Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.
Lee, J., Grigg, W., & Dion, G. (2007). The Nation’s Report Card: Mathematics 2007. National Center for Education Statistics, Institute of Education Sciences, U. S. Department of Education, Washington, D. C.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233-245
Lord, F.M. (1984). Maximum likelihood and Bayesian parameter estimation in item response theory (Research Report No. RR-84-30-ONR). Princeton, NJ: Educational Testing Service.
Martin, M. O., Mullis, I. V. S., & Kennedy, A. M. (2007). PIRLS 2006 Technical Report. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Mislevy, R. J. (1991). Randomization-based inference about latent variable from complex samples. Psychometrika, 56(2), 177-196.
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.
Mislevy, R. J., & Sheehan, K. M. (1989). Information matrices in latent-variable models. Journal of Educational Statistics, 14, 335-350.
Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics form sparse matrix samples of item response. Journal of Educational Measurement, 29, 133-161.
Mullis, I. V. S., Martin, M. O., & Foy, P. (with Olson, J. F., Preuschoff, C., Erberber, E., Arora, A., & Galia, J. ) . (2008). TIMSS 2007 International Mathematics Report. Finding from IEA’s Trends in International Mathematics and Science Study at the Fourth and Eighth Grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Nancy, L. A., James, E. C., & John, R. D. (2001). The NAEP 1998 Technical Report (NCES 2001-509). National Assessment Governing Board, U.S. Department of Education.
Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and Combinatorial Optimization. New York: John Wiley.
OECD (2005). PISA 2003 Technical Report. OCED, Paris.
OECD (2009). PISA 2006 Technical Report. OCED, Paris.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1993). Scaling, Norming, and Equating. In R.L. Linn (Ed.), Educational Measurement (3rd ed., pp221-262). New York: Macmillan.
Rasch, G. (1960). Probabilistic models for some Intelligence and attainment tests. Chicago: University of Chicago Press.
Tianyou, W. (2005). An Alternative Continuization Method to the Kernel Method in von Davier, Holland and Thayer's (2004) Test Equating Framework.
van der Linden, W. J., Veldkamp, B. P., & Carlson, J. E. (2004).Optimizing Balanced Incomplete Block Designs for Educational Assessments. Applied Psychological Measurement, 28, 317-331.
von Davier M., Gonzalez, E., & Mislevy, R. J. (2009).What are plausible values and why are they useful? IERA Monograph Series:Issues and Methodologies in Large-Scale Assessment,2,.9-36.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
Warm, T.A. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika , 54 , 427–450
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31 (2-3), 114-128.
Yates, F. (1936). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci. 26, 424-455

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top