Original article | International Journal of Progressive Education 2017, Vol. 13(1) 136-152
Turgay Han
pp. 136 - 152 | Manu. Number: ijpe.2017.008
Published online: February 01, 2017 | Number of Views: 285 | Number of Download: 931
Abstract
The aim of this study is to examine the variability in and reliability of scores assigned to different quality EFL compositions by EFL instructors and their rating behaviors. Using a mixed research design, quantitative data were collected from EFL instructors’ ratings of 30 compositions of three different qualities using a holistic scoring rubric. Qualitatively, think-aloud protocol data were collected concretely from a sub-sample of raters. The generalizability theory (G-theory) approach was used to analyze the quantitative data. The results showed that the raters mostly deviated while giving scores to very low level and mid-range compositions, but that they were more consistent while rating very high-level compositions. The reliability of the ratings of high quality papers (e.g. g: .87 and phi: .79 respectively) was higher than the coefficients obtained for mid-range and low quality compositions. This result indicated that more reliable ratings could be obtained in the rating of high quality papers. The think-aloud protocol analysis indicated that the raters attended differently to different aspects of these three level compositions. Implications are given from performance assessment practice perspectives.
Keywords: Inexpert raters, generalizability theory, variability of ratings, writing assessment.
How to Cite this Article? |
---|
APA 6th edition Harvard Chicago 16th edition |