跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.0) 您好!臺灣時間:2024/04/24 11:27
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林冠忻
研究生(外文):Lin, Kuan-Hsin
論文名稱:評分者在不同群體受試者下極端反應風格試題反應理論模式的發展與應用
論文名稱(外文):Development and Application of Item Response Theory Models for Measuring Raters’ Extreme Response Style for Different Groups
指導教授:黃宏宇黃宏宇引用關係
指導教授(外文):Huang, Hung-Yu
口試日期:2022-06-29
學位類別:碩士
校院名稱:臺北市立大學
系所名稱:心理與諮商學系
學門:社會及行為科學學門
學類:心理學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:87
中文關鍵詞:評分者極端反應風格試題反應理論
外文關鍵詞:raterextreme response styleitem response theory
相關次數:
  • 被引用被引用:0
  • 點閱點閱:63
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
為了要了解評分者中介評量中,評分者在評分的過程是否會因為受試者背景變項或群體不同而導致有極端反應的現象出現,本研究旨在發展「評分者在不同群體受試者下極端反應風格試題反應理論模式」,並檢驗其有效性,新研發的模式:評分者極端反應風格廣義部份給分模式(Rater – ERS - GPCM),與過去的模式差異在於,新模式中評分者增加了受評者群體特性變項,並探討其交互作用。本論文研究共有兩個部分,分別為模擬研究與實徵研究。在模擬研究中,透過操弄「試題題數」、「評分等級」、「樣本數」與「是否有遺漏值」不同情境下,在試題參數、評分者參數和受試者能力參數的估計效果。在實徵研究中使用國家教育研究院,國民中小學主任儲訓計畫,不同群體的部分,本研究是依性別分成兩群體,最後使用實徵資料,用以檢驗模式在實務上的適配度。
模擬研究結果顯示:(1)模式在試題參數估計上有良好的估計效果;(2)模式在評分者參數估計上有良好的效果;(3)模式在受試者能力參數估計上有良好的效果。在實徵研究結果顯示:評分者極端反應風格廣義部份給分模式具有較佳的模式適配度,若忽略不同群體的極端反應差異而使用多面向概化部份給分模式分析時,會造成受試者能力排序上的變異;此外,也發現評分者評不同群體時差異不大,但是不同受試者能力參數差異較大。最後,依據本研究的結果提出一些建議,供後續的研究者參考。
To investigate whether raters show systematic differences in extreme response style for different ratees’ groups in rater-mediated assessments, this study aims to develop item response theory models for measuring raters’ extreme response style for different ratees’ groups and evaluate the estimation quality of the proposed models. By incorporating the effect of ratees’ groups on the rater weight parameters, this study developed a new class of Rater-ERS-GPCM to model the variations of raters’ ERS magnitudes among different ratees’ groups.
In the simulation study, four factors, including the number of test lengths, the number of rating categories, sample size, and complete/incomplete rating design, were manipulated. For the empirical study, the author used the raters’ rating data collected from the National Academy for Educational Research for the purpose of school director selection to demonstrate the application to the new models to real data analysis. Ratees’ gender was used to investigate different ERS tendency over ratees.
The results of this study were summarized as follows: (1) this model provided good item parametere stimation; (2) this model provided good rater parameter estimation; and (3) this model provided good ratees ability parameter estimation. In the empirical study, we found that the effect of ratees’ gender had little impacts on the ERS tendency. Finally, some suggestions based on the results are provided by the author.
中文摘要 III
Abstract V
目錄 VII
表目錄 IX
圖目錄 XI
第一章 緒論 1
第一節 研究動機 1
第二節 研究目的 3
第三節 研究問題 4
第四節 名詞釋義 5
第二章 文獻探討 7
第一節 評分者測驗情境 7
第二節 評分者的評分效果 10
第三節 試題反應理論 13
第四節 極端反應風格模式 18
第五節 新的極端反應風格試題反應理論模型 27
第三章 研究方法 37
第一節 研究架構 37
第二節 模擬資料 39
第三節 實徵資料 44
第四節 資料分析 44
第五節 評斷指標 47
第四章 研究結果 49
第一節 模擬研究 49
第二節 實徵研究 65
第五章 結論與建議 69
第一節 研究結論 69
第二節 研究限制與建議 71
參考文獻 73
一、中文部分
王文中(2004)。教育測驗與評量: 教室學習觀點。五南圖書出版股份有限公司。
余民寧(2009)。試題反應理論 (IRT) 及其應用。心理出版社。
李坤崇(2006)。教學評量 (Vol. 31)。心理出版社。
周文欽、賴保禎、歐滄和(2003)。心理與教育測驗。國立空中大學。
郭生玉(2004)。教育測驗與評量。精華。
陳柏熹(2006)。能力估計方法對多向度電腦化適性測驗測量精準度的影響. 教育心理學報, 38(2), 195-211。
陳柏熹(2011)。 心理與教育測驗:測驗編製理論與實務。精策教育。
陳新豐(2015)。 教育測驗與學習評量。五南圖書出版股份有限公司。
簡茂發(2002)。 心理測驗與統計方法 。心理。

二、英文部分

Alexeev, N., Templin, J. L., & Cohen, A. S. (2011). Spurious latent classes in the mixture Rasch model. Journal of Educational Measurement, 48(3), 313–332. https://doi.org/10.1111/j.1745-3984.2011.00146.x
Andrich, D. (1978). A rating formulation for ordered respons categories. Psychometrika, 43(4), 561-573. https://doi.org/10.1007/BF02293814
Arce-Ferrer , A. J. ( 2006). An investigation into the factors influencing extreme-response style: Improving meaning of translated and culturally adapted rating scales . Educational and Psychological Measurement , 66(3) , 374 – 392 . https://doi.org/10.1177/0013164405278575
B¨ockenholt, U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17(4), 665–678. https://doi.org/ 10.1037/a0028111
B¨ockenholt, U. (2017). Measuring response styles in Likert items. Psychological Methods, 22(1), 69–83. https://doi.org/10.1037/met0000106
Bachman, J. G., & O’Malley, P. M. (1984). Yea-saying, nay-saying and going to extremes: Black-white differences in response styles. Public Opinion Quarterly, 48(2), 491-509. https://doi.org/10.1086/268845
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. CRC press. https://doi.org/10.1201/9781482276725
Barrett, S. (2001). The impact of training on rater variability. International Education Journal, 2(1),49-58.https://webapps.flinders.edu.au/education/iej/articles/v2n1/ barrett/barrett.pdf
Barth, J., de Boer, W. E. L., Busse, J. W., Hoving, J. L., Kedzia, S., Couban, R., Fischer, K., von Allmen, D. Y., Spanjer, J., & Kunz, R. (2017). Inter-rater agreement in evaluation of disability: Systematic review of reproducibility studies. British Medical Journal, 356(j14). https://doi.org/10.1136/bmj.j14
Batchelor, J. H., & Miao, C. (2016). Extreme response style: A meta-analysis. Journal of OrganizationalPsychology, 16(2),51-62. http://articlegateway.com/ index.php/ JOP/article/view/1790
Berg, A., Meyer, R., & Yu, J. (2004). Deviance information criterion for comparing stochastic volatility models. Journal of Business & Economic Statistics, 22(1), 107-120. https://doi.org/10.1198/073500103288619430
Berg, I., & Collier, J. (1953). Personality and group differences in extreme response sets. Educational and Psychological Measurement, 13(2), 164–169. https:// doi.org /10.1177/001316445301300202
Bernardin, H. J., Buckley, M. R., Tyler, C. L., & Wiese, D. S. (2000). A reconsideration of strategies for rater training. Research in personnel and human resources management, 18, 221-274.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability Addison– Wesley. Statistical theories of mental test scores.
Borgatta, E., & Glass, D. (1961). Personality concomitants of extreme response set (ERS). Journal of Social Psychology, 55(2), 213–221. https://doi.org/10.1080/ 00224545. 1961.9922176
Breland, H. M., & Jones, R. J. (1984). Perceptions of writing skills. Written Communication, 1(1), 101-119. https://doi.org/10.1177/0741088384001001005
Brengelmann, J. (1960. A note on questionnaire rigidity and extreme response set. Journal of Mental Science, 106(442), 187–192. http://doi.org/10.1192/bjp.106. 442.187
Carnevale, P. J., de Dreu, C., Gelfand, M. J.(2001). Culture and deception in business negotiations: Amultilevel analysis. International Journal of Cross-CulturalManagement, 1(1), 73-90. https://doi.org/10.1177/147059580111008
Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modeling. Journal of Cross-Cultural Psychology, 31(2), 187-212. https://doi.org/10.1177/ 0022022100031002003
Chen, C., Lee, S. Y., & Stevenson, H. W. (1995) Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychological Science, 6(3), 170-175. https://doi.org/10.1111/j.1467-9280.1995. tb00327.x
Chib, S., & Greenberg, E. (1995). Understanding the metropolis-hastings algorithm. The American Statistician, 49(4),327-335. https://doi.org/10.1080/ 00031305.1995.10476177

Chun, K. T., & Campbell, J. B. (1974). Extreme response style in cross-cultural research: A reminder. Journal of Cross-Cultural Psychology, 5(4), 465-480. https://doi.org /10.1177/002202217400500407
Clarke, I. (2000). Extreme response style in cross-cultural research: An empirical investigation. Journal of Social Behavior and Personality, 15(1), 137-152.
Congdon, P. J., & MeQueen, J. (2000). The stability of rater severity in large‐scale assessment programs. Journal of Educational Measurement, 37(2), 163-178. https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
Crandall, J. (1973). Sex differences in extreme response style: Differences in frequency of use of extreme positive and negative ratings. Journal of Social Psychology, 89(2), 281–293. https://doi.org/10.1080/00224545.1973.9922601
Cronbach, L. J., & Furby, L. (1970). How we should measure" change": Or should we? Psychological bulletin, 74(1), 68-80. https://doi.org/10.1037/h0029382
Davies, P. (1999). What is evidence‐based education?. British journal of educational studies, 47(2), 108-121. https://doi.org/10.1111/1467-8527.00106
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164-185. https://doi.org/ 10.1111/jedm.12009
DeCarlo, L. T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42(1), 53-76. https://doi. org/10.1111/j.0022-0655.2005.00004.x
DeCarlo, L. T., Kim, Y., & Johnson, M. S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48(3), 333-356. https://doi.org/10.1111/j.1745-3984.2011.00143.x
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221. https://doi.org/10.1207/s15434311laq020 3_ 2
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155-185. https://doi.org/ 10.1177/0265532207086780
Eckes, T. (2009). Many-facet Rasch measurement. Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (Section H)
Eid, M., & Rauber, M. (2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment, 16(1), 20–30. https://doi. org/10.1027/1015-5759.16.1.20
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Maheah.
Engelhard Jr, G. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied measurement in Education, 5(3), 171-191. https://doi.org/10.1207 /s15324818ame0503_1
Engelhard Jr, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
Engelhard Jr, G., & Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English Literature and composition program with a many‐faceted Rasch model. ETS Research Report Series, 2003(1), i-60. https://doi. org/10.1002/j.2333-8504.2003.tb01893.x
Engelhard, G. (2008). Differential rater functioning. Rasch Measurement Transactions, 21(3), 1124. http://www.rasch.org/rmt/rmt213f.htm
Engelhard, G., & Gordon, B. (2000). Setting and evaluating performance standards for high stakes writing assessments. Objective measurement: Theory into practice, 5, 3-14.
Ford, A. (1931). Neutralizing inequalities in ratings. Personnel Journal, 9, 466-469. https://doi.org/10.1037/h0074936
Foriska, T. J. (1998). Restructuring around Standards: A Practitioner's Guide to Design and Implementation. Corwin Press.
Freedman, S. W., & Calfee, R. C. (1983). Holistic assessment of writing: Experimental design and cognitive theory. Research on writing: Principles and methods,60(2), 75-98.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence , (6), 721-741. https://doi.org/10.1109/TPAMI.1984.476759 6
Gerardo, M., Gamba, R., & Marin, B. (1992). Extreme response style and acquiescence among Hispanics: The role of acculturation and education. Journal of Cross Cultural Psychology, 23(4), 498–509. https://doi.org/10.1177/0022022192234006
Gibbons, J., Zeller, J., & Rudek, D. (1999). Effects of language and meaningfulness on the use of extreme response style by Spanish-English bilinguals. Cross Cultural Research, 33(4), 369-381. https://doi.org/10.1177/106939719903300404
Greenleaf, E. (1992a). Improving rating scale measures by detecting and correcting bias components in some response styles. Journal of Marketing Research, 29(2), 176–188. https://doi.org/10.1177/002224379202900203
Greenleaf, E. (1992b). Measuring extreme response style. Public Opinion Quarterly, 56(3), 328–351. https://doi.org/10.1086/269326
Guilford, J. P. (1954). Psychometric methods (2nd ed.). McGraw Hill. https://doi.org/10.1037/h0039281
Hall, E. T. (1976). Beyond culture. NY: Anchor Press.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 147-200). Macmillan
Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14(2), 75-96. http://doi.org/10.1111/j.1745-3984.1977.tb00030.x
Hambleton, R. K., & Swaminathan, H. (1985). Assumptions of item response theory. In Item Response Theory. Springer. http://doi.org/10.1007/978-94-017-1988-9_2
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer-Nijhoff.
Hambleton, R. K., Shavelson, R. J., Webb, N. M., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
Hamilton, D. L. (1968). Personality attributes associated with extreme response style. Psychological bulletin, 69(3), 192. https://doi.org/10.1037/h0025606
Harzing, A. (2006). Response styles in cross national survey research: A 26-country study. International Journal of Cross Cultural Management , 6(2) , 243 – 266 . https://doi.org/10.1177/1470595806066332
Hofstede, G. (1998). Masculinity and femininity: The taboo dimension of national cultures. Sage.
Hofstede, G. (2001). Culture’s consequences (2nd ed.). Sage.
Huang, H. Y. (2016). Mixture random-effect IRT models for controlling extreme response style on rating scales. Frontiers in Psychology, 7, 1706. https://doi.org /10.3389/fpsyg.2016.01706
Huang, H. Y., & Wang, W. C. (2013). Higher order testlet response models for hierarchical latent traits and testlet-based items. Educational and Psychological Measurement, 73(3), 491-511. https://doi.org/10.1177/0013164412454431
Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format on extreme response set. Journal of Cross-Cultural Psychology, 20(3), 296-309. https://doi. org /10.1177/0022022189203004
Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135-159. https://doi.org/10.1080/15434303.2013. 769545
Jackson, D. N., & Messick, S. (1958). Content and style in personality assessment. Psychological bulletin, 55(4), 243. https://doi.org/10.1037/h0045996
Jin, K. Y., & Wang, W. C. (2014). Generalized IRT models for extreme response style. Educational and Psychological Measurement, 74(1), 116-138. https://doi. org/10.1177/ 0013164413498876
Jin, K. Y., & Wang, W. C. (2017). Assessment of differential rater functioning in latent classes with new mixture facets models. Multivariate behavioral research, 52(3), 391-402. https://doi.org/10.1080/00273171.2017.1299615
Jin, K. Y., & Wang, W. C. (2018). A new facets model for rater's centrality/extremity response style. Journal of Educational Measurement, 55(4), 543-563. https://doi. org/10.1111/jedm.12191
Johnson, T., Kulesa, P., Cho, Y. I., & Shavitt, S. (2005). The relation between culture and response styles: Evidence from 19 countries. Journal of Cross-cultural psychology, 36(2), 264-277. https://doi.org/10.1177/0022022104272905
Khorramdel, L., & von Davier, M. (2014). Measuring response styles across the big five: A multiscale extension of an approach using multinomial processing trees. Multivariate Behavioral Research, 49(2), 161–177. https://doi.org/10.1080/ 00273171. 2013.866536
Kieruj, N. D., & Moors, G. (2010). Variations in response style behavior by response scale format in attitude research. International journal of public opinion research, 22(3), 320-342. https://doi.org/10.1093/ijpor/edq001
Kneeland, N. (1929). That lenient tendency in rating. Personnel Journal, 7, 356-366.
Kritikos, E. P. (2010). Special education assessment: Issues and strategies affecting today's classrooms. Merrill.
Lawson, D. M., & Brailovsky, C. (2006). The presence and impact of local item dependence on objective structured clinical examinations scores and the potential use of the polytomous, many-facet Rasch model. Journal of manipulative and physiological therapeutics, 29(8), 651-657. https://doi.org/10.1016/j.jmpt.2006. 08.002
Lee, C., & Green, R. T. (1991). Cross-cultural examination of the Fishbein behavioral intentions model. Journal of International Business Studies, 22(2), 289-305. https://doi.org/10.1057/palgrave.jibs.8490304
Lewis, N., & Taylor, J. (1955). Anxiety and extreme response preferences. Educational & Psychological Measurement, 15(2), 111–116. https://doi.org/10.1177/0013164 45501500203
Li, F., Cohen, A. S., Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33(5), 353-373. https://doi.org/10.1177/0146621608326422
Light, C., Zax, M., & Gardiner, D. (1965). Relationship of age, sex, and intelligence level to extreme response style. Journal of Personality & Social Psychology, 2(6), 907-909. https://doi.org/10.1037/h0022746
Likert, R. (1932). A technique for the measurement of attitudes. Nueva York: McGraw-Hill.
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543-560. https://doi.org/10.1177/0265532211406422
Linacre, J. M. (1989). Many-faceted Rasch measurement. MESA Press.
Linacre, J. M. (2001). FACETS [Computer program, version 3.36.2]. MESA Press.
Linn, R., & Miller, M. D. (2005). Measurement and evaluation in teaching. Merrill.
Lord, F. M. (1953). An application of confidence intervals and of maximum likelihood to the estimation of an examinee's ability. Psychometrika, 18(1), 57-76. https://doi. org/10.1007/BF02289028
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246-276. https://doi.org/10.1191/ 0265532202lt230oa
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language testing, 12(1), 54-71. https://doi.org/10.1177/026553229 501200104
Marín, G., Gamba, R. J., & Marín, B. V. (1992). Extreme response style and acquiescence among Hispanics: The role of acculturation and education. Journal of Cross-Cultural Psychology, 23(4), 498-509. https://doi.org/10.1177/00220221 92234006
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174. https://doi.org/10.1007/BF02296272
McNamara, T. (1996). Measuring second language performance. Longman.
Molto, J., Segarra, P., & Avila, C. (1993). Impulsivity and total response speed to a personality questionnaire. Personality and Individual Differences, 15(1), 97–98. https://doi.org/10.1016/0191-8869(93)90047-7
Morren, M., Gelissen, J., & Vermunt, J. (2012). The impact of controlling for extreme responding on measurement equivalence in cross-cultural research. European Journal of Research Methods for the Behavioral and Social Sciences, 8(4), 159. https://doi.org/10.1027/1614-2241/a000048
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i-30. https://doi.org/10.1002/ j.2333-8504.1992.tb01436.x
Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology, 1(2), 148-160. https://doi.org/10.1111/j.1754-9434.2008.00030.x
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Sage.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of applied measurement, 4(4), 386-422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of applied measurement, 5(2), 189-227.
Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371-389. https://doi.org/10. 1111/j.1745-3984.2009.00088.x
Naemi, B. D., Beal, D. J., & Payne, S. C. (2009). Personality predictors of extreme response style. Journal of personality, 77(1), 261-286. https://doi.org/10.1111/ j.1467-6494.2008.00545.x
Pula, J. J., & Huot, B. A. (1993). A model of background influences on holistic raters. Validating holistic scoring for writing assessment: Theoretical and empirical foundations, 237-265. Hampton Press.
Randall, J., & Engelhard Jr, G. (2009). Examining teacher grades using Rasch measurement theory. Journal of Educational Measurement, 46(1), 1-18. https:// doi.org/10.1111/j.1745-3984.2009.01066.x
Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Oxford, England: Nielsen & Lydiche.
Robbins, S. P. (1989). Organizational behavior (4th ed.). Prentice-Hall.
Rost, J., Carstensen, C., & von Davier, M. (1997). Applying the mixed Rasch model to personality questionnaires. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 324–332). Waxmann.
Saal, F. E., Downey, R. G., and Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428. https://doi.org/10.1037/0033-2909.88.2.413
Schaninger, C. M., & Buss, W. C. (1986). A longitudinal comparison of consumption and finance handling between happily married and divorced couples. Journal of Marriage and the Family, 48(1), 129-136. https://doi.org/10.2307/352236
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956. https:// doi.org/10.1037/0021-9010.85.6.956
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of latent non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40(2), 98–113. https://doi.org/10.1177/0146621615605080
Soueif, M. (1958). Extreme response sets as a measure of intolerance of ambiguity. British Journal of Psychology, 49(4), 329–334. https://doi.org/10.1111/j.2044-8295.1958.tb00672.x
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the royal statistical society: Series b (statistical methodology), 64(4), 583-639. https://doi.org/10.1111/1467-9868.00353
Stahl, J. A., & Lunz, M. E. (1996). Judge performance reports: media and message. Objective measurement: Theory into practice, 3, 113-125.
Stening, B.W., & Everett, J. E. (1984). Response styles in a cross-cultural managerial study. Journal of Social Psychology, 122(2), 151-156. https://doi.org/10.1080/ 00224545.1984.9713475
Tamanini, K. B. (2008). Evaluating Differential Rater Functioning in Performance Ratings: Using a Goal-Based Approach (doctoral dissertation), Ohio University.
Thissen-Roe, A., & Thissen, D. (2013). A two-decision model for responses to Likert-type items. Journal of Educational and Behavioral Statistics, 38(5), 522–547. https://doi.org/10.3102/1076998613481500
Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4(1), 25-29. https://doi.org/10.1037/h0071663
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3-22. https://doi.org/10. 1177/0265532215594830
Triandis, H. C. (1995). Individualism and collectivism. Westview Press.
Triandis, H. C., & Gelfand, M. J. (1998). Converging measurement of horizontal and vertical individualism and collectivism. Journal of Personality and Social Psychology, 74(1), 118-128. https://doi.org/10.1037/0022-3514.74.1.118
van Herk, H., Poortinga, Y. H., & Verhallen, T. M. M. (2004). Response styles in rating scales: Evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology, 35(3), 346-360. https://doi.org/10.1177/0022022104264126
von Davier, M., Eid, M., & Zickar, M. J. (2007). Detecting response styles and faking in personality and organizational assessments by mixed Rasch models. In C. H. Carstensen (Ed.), Multivariate and mixture distribution Rasch models (pp. 255–270). Springer. https://doi.org/10.1007/978-0-387-49839-3_16
Wainer, P. W. H. H. (1993). Differential item functioning. Psychology Press.
Wang, W. C., & Wu, S. L. (2011). The random‐effect generalized rating scale model. Journal of Educational Measurement, 48(4), 441-456. https://doi.org/ 10.1111/j.1745-3984.2011.00154.x
Wang, W. C., Su, C. M., & Qiu, X. L. (2014). Item response models for local dependence among multiple ratings. Journal of Educational Measurement, 51(3), 260-280. https://doi.org/10.1111/jedm.12045
Wang, W. C., Wilson, M., & Shih, C. L. (2006). Modeling randomness in judging rating scales with a random‐effects rating scale model. Journal of Educational Measurement, 43(4), 335-353. https://doi.org/10.1111/j.1745-3984.2006.00020.x
Warnecke, R. B., Johnson, T. P., Chavez, N., Sudman, S., O’Rourke, D., Lacey, L., et al. (1997). Improving question wording in surveys of culturally diverse populations. Annals of Epidemiology, 7(5), 334-342. https://doi.org/10.1016/ S1047-2797(97)00030-6
Watkins, D., & Cheung, S. (1995). Culture, gender, and response bias: An analysis of responses to the Self-Description Questionnaire. Journal of cross-cultural psychology, 26(5), 490-504. https://doi.org/10.1177/0022022195265003
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
Wesolowski, B., Wind, S. A., & Engelhard, G. (2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147–170. https://doi.org/10.1177/102986 4915589014
Wolfe, E. W. (2004). Identifying rater effects using latent trait models. Psychology Science, 46(1), 35-51.
Wolfe, E. W., & Chiu, C. W. (1997). Detecting Rater Effects with a Multi-Faceted Rating Scale Model. Presentation at the Annual Meeting of the National Council on Measurement in Education, Chicago, Illinois, March.
Wolfe, E. W., Chiu, C. W., & Myford, C. M. (2000). Detecting rater effects in simulated data with a multi-faceted Rasch rating scale model. Objective measurement: Theory into practice, 5, 147-164.
Wolfe, E. W., Kao, C. W., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15(4), 465-492. https://doi. org/10.1177/0741088398015004002
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA press.
Wu, M. (2017). Some IRT-based analyses for interpreting rater effects. Psychological Test and Assessment Modeling, 59(4), 453-470.
Wu, S. M., & Tan, S. (2016). Managing rater effects through the use of FACETS analysis: The case of a university placement test. Higher Education Research and Development, 35(2),380–394. https://doi.org/10.1080/07294360.2015.1087381
Yue, X. (2011). Detecting rater centrality effect using simulation methods and Rasch measurement analysis (Doctoral dissertation, Virginia Tech).
Zax, M., & Takahashi, S. (1967). Cultural influences on response style: Comparisons of Japanese and American college students. Journal of Social Psychology, 71(1), 3-10. https://doi.org/10.1080/00224545.1967.9919760
電子全文 電子全文(網際網路公開日期:20271231)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top