臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.106) 您好！臺灣時間：2026/04/03 02:44

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

葉昶成

研究生(外文):

Jeffrey Yeh

論文名稱:

不同垂直等化設計下可能值方法估計效果之探討

指導教授:

郭伯臣

、吳慧珉

指導教授(外文):

Bor-Chen Kuo、Huey-Min Wu

口試委員:

柯華葳、廖晨惠、楊裕貿

口試委員(外文):

Hwawei Ko、Chen Huei Liau、Yu Mao Yang

口試日期:

2012-06-20

學位類別:

碩士

校院名稱:

國立臺中教育大學

系所名稱:

教育測驗統計研究所

學門:

教育學門

學類:

教育測驗評量學類

論文種類:

學術論文

論文出版年:

2012

畢業學年度:

100

語文別:

中文

論文頁數:

中文關鍵詞:

大型測驗、臺灣學生學習成就評量資料庫、可能值、等化設計、垂直等化

外文關鍵詞:

large-scale assessment、Taiwan Assessment of Student Achievement、plausible values、equating design、vertical equating

相關次數:

被引用:2
點閱:262
評分:
下載:8
書目收藏:0

許多國際上的大型測驗，多採用可能值方法來進行群體能力參數的估計。而可能值的資料型態，亦可讓資料分析者進行統計特性的描述。此外，一般大型測驗所評量的範圍都涵蓋了不同的認知向及難度，無法由單一受試者於短期間內全部完成，測驗題目都會進行不同的等化設計以減輕受試者負擔並達成測驗的目的。

本研究係各以定錨不等組（non-equivalent groups with anchor test design, NEAT）及平衡不完全區塊（balanced incomplete block design, BIB）的垂直等化設計，並以可能值方法、納入背景變項的期望後驗法、期望後驗法及最大概度估計法等各種方法別進行個體能力及群體能力的平均數與標準差的估計，其主要的目的在於探討可能值方法及其它估計法在群體參數回復的效果。

本研究結果發現在各種不同的垂直等化設計下，不管是個體能力參數的估計，或是群體能力平均數與標準差的回復上，納入背景變項估計方法皆有較好的估計效果。尤其在群體能力標準差的回復上，可能值方法的估計結果遠優於各種估計方法。

The purpose of this paper is to explore the performance of plausible values method under BIB and NEAT designs for vertical equating based on simulated data. The major focus of large-scale assessments is always on the population statistics, such as means and standard deviations, and the plausible value method is usually used to estimate the population parameters. For large-scale assessments the spectrum of subject matter is usually wide, but the testing time is short. Therefore, in order to cover the proficiency domain sufficiently, multiple booklets are used. Balanced incomplete block design (BIB) and non-equivalent groups with anchor test design (NEAT) are two popular test equating methods for this condition. The experimental results show that the estimating method based on plausible values estimate better than that of other methods in vertical equating designs, and as the test length increase, population parameters (mean and standard deviation) are well estimated. In these experimental situations, the estimations of population parameters are not affected by sample size (16128 and 10920). Both linking designs, BIB and NEAT, can lead to more precision estimates by using plausible value method.

摘要 I
Abstract II
目錄 III
表目錄 V
圖目錄 VI
第一章　緒論 1
第一節　研究動機 1
第二節　研究目的與待答問題 3
第三節　名詞解釋 3
第二章　文獻探討 7
第一節　單向度試題反應理論 7
第二節　參數估計方法 8
第三節　可能值方法 13
第四節　測驗等化設計 16
第三章　研究方法 21
第一節　研究步驟 21
第二節　測驗等化設計 23
第三節　模擬條件與估計方法設定 25
第四節　研究工具 28
第五節　評估準則 29
第四章　研究結果與討論 31
第一節 NEAT等化設計估計結果 31
第二節 BIB等化設計估計結果 38
第三節二種等化設計方法之比較 44
第五章結論與建議 49
第一節結論 49
第二節建議 50
參考文獻 53
中文部分 53
英文部分 54
附錄一 NEAT設計個體能力值不同估計方法之RMSE 59
附錄二 BIB設計個體能力值不同估計方法之RMSE 63
附錄三 NEAT設計群體能力平均數不同估計方法之RMSE 66
附錄四 BIB設計群體能力平均數不同估計方法之RMSE 71
附錄五 NEAT設計群體能力標準差不同估計方法之RMSE 74
附錄六 BIB設計群體能力標準差不同估計方法之RMSE 78

中文部分
王敏嫻（2011）。不同水平等化設計於可能值方法之探討。未出版之碩士論文，臺中教育大學教育測驗統計研究所，臺中市。
王暄博（2006）。BIB與NEAT設計之水平及垂直等化效果比較。未出版之碩士論文，臺中教育大學教育測驗統計研究所，臺中市。
余民寧（2009），試題反應理論（IRT）及其應用（一版）。臺北市，心理出版社股份有限公司。
洪碧霞、林素微、林娟如（2006）。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊，2（4），69-86。
郭伯臣、曾建銘、吳慧珉主編(2012)。大型標準化測驗建置流程應用於TASA之研究。新北市：國家教育研究院。
曾玉琳、王暄博、郭伯臣、許天維（2006）。不同BIB設計對測驗等化的影響。測驗統計年刊，13（2），209-229。臺中市：國立臺中教育大學。

英文部分
Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.
Allen, N. L., Carlson J. E., Johnson E. G. ,& Mislevy, R. J. (1999) The NAEP 1998 technical report. Educational Testing Service.
Allen, N. L., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report. Washington, DC: National Center for Educational Statistics.
Andrew, R. W. & Terry, L. S., (2001). The NAEP 1998 Technical Report (NCES 2001-509). National Assessment Governing Board, U.S. Department of Education.
Baker, F. B., & Kim, S. H. (2004). Item Response Theory : Parameter Estimation Techniques. Basel, N. Y. : Marcel Dekker, Inc.
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.
Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. New York: Chapman & Hall. (Distributed by Halsted Press, New York)
Dorans, N. J. & Holland, P. W. (2000). Linking Scores from Multiple Instruments.
Foy, P., Galia, J., & Li, L. (2008). Scaling the data from the TIMSS 2007 Mathematics and Science assessments. In John F. Olson,Michael O. Martin ,Ina V.S. Mullis. (Eds). TIMSS 2007 Technical Report.TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Glas, C. A. W., & Geerlings, H. (2009). Psychometric aspects of pupil monitoring systems. Studies in Educational Evaluation, 35, 83–88.
Graham J. R., Christine, Y. O’S., Alka, A., & Ebru, E. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with non-random groups. Journal of Educational Measurement, 22, 197-206.
Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.
Lee, J., Grigg, W., & Dion, G. (2007). The Nation’s Report Card: Mathematics 2007. National Center for Education Statistics, Institute of Education Sciences, U. S. Department of Education, Washington, D. C.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233-245
Lord, F.M. (1984). Maximum likelihood and Bayesian parameter estimation in item response theory (Research Report No. RR-84-30-ONR). Princeton, NJ: Educational Testing Service.
Martin, M. O., Mullis, I. V. S., & Kennedy, A. M. (2007). PIRLS 2006 Technical Report. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
Mislevy, R. J. (1991). Randomization-based inference about latent variable from complex samples. Psychometrika, 56(2), 177-196.
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.
Mislevy, R. J., & Sheehan, K. M. (1989). Information matrices in latent-variable models. Journal of Educational Statistics, 14, 335-350.
Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics form sparse matrix samples of item response. Journal of Educational Measurement, 29, 133-161.
Mullis, I. V. S., Martin, M. O., & Foy, P. (with Olson, J. F., Preuschoff, C., Erberber, E., Arora, A., & Galia, J. ) . (2008). TIMSS 2007 International Mathematics Report. Finding from IEA’s Trends in International Mathematics and Science Study at the Fourth and Eighth Grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Nancy, L. A., James, E. C., & John, R. D. (2001). The NAEP 1998 Technical Report (NCES 2001-509). National Assessment Governing Board, U.S. Department of Education.
Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and Combinatorial Optimization. New York: John Wiley.
OECD (2005). PISA 2003 Technical Report. OCED, Paris.
OECD (2009). PISA 2006 Technical Report. OCED, Paris.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1993). Scaling, Norming, and Equating. In R.L. Linn (Ed.), Educational Measurement (3rd ed., pp221-262). New York: Macmillan.
Rasch, G. (1960). Probabilistic models for some Intelligence and attainment tests. Chicago: University of Chicago Press.
Tianyou, W. (2005). An Alternative Continuization Method to the Kernel Method in von Davier, Holland and Thayer's (2004) Test Equating Framework.
van der Linden, W. J., Veldkamp, B. P., & Carlson, J. E. (2004).Optimizing Balanced Incomplete Block Designs for Educational Assessments. Applied Psychological Measurement, 28, 317-331.
von Davier M., Gonzalez, E., & Mislevy, R. J. (2009).What are plausible values and why are they useful? IERA Monograph Series:Issues and Methodologies in Large-Scale Assessment,2,.9-36.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
Warm, T.A. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika , 54 , 427–450
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31 (2-3), 114-128.
Yates, F. (1936). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci. 26, 424-455

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	BIB與NEAT設計之水平及垂直等化效果比較
2.	不同水平等化設計於可能值方法之探討
3.	結合輔助訊息之單向度IRT三參數模式估計成效探究
4.	以可能值方法為基礎之多向度垂直等化之探究
5.	應用可能值方法於大型測驗不同年度間連結方法之效果探究

1.	洪碧霞、林素微、林娟如（2006）。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊，2（4），69-86。
2.	曾玉琳、王暄博、郭伯臣、許天維（2006）。不同BIB設計對測驗等化的影響。測驗統計年刊，13（2），209-229。臺中市：國立臺中教育大學。

1.	潛在語意分析於兒童記敘文詞彙教學之應用
2.	國小視覺藝術課程以創造思考活動為中心對五年級學生描繪能力影響之行動研究
3.	探討人格特質與訓練遷移之中介變項：以國際貿易人員在職訓練為例
4.	多向度試題反應理論下不同估計方法估計成效之探討
5.	大學校院視覺識別系統之規畫設計-以逢甲大學為例
6.	參加課後托育機構對國小學生家庭親子關係及學業成就
7.	臺中市國民中小學教師未參加專業發展評鑑之研究
8.	國民小學校長正向領導與學校組織氣氛關係之研究
9.	臺中市正典國小實施兒童讀經教育之個案研究
10.	越南華語學習者繁體字識字能力之調查研究
11.	嘉義市國小教師負向情緒管理與班級經營效能之研究
12.	結合輔助訊息之單向度IRT三參數模式估計成效探究
13.	以可能值方法為基礎之多向度垂直等化之探究
14.	國民小學靜思語教材生命教育相關內容研究
15.	九年一貫第三階段綜合活動學習領域教科書生命教育相關內容分析研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室