Reliability of Essay Ratings: A Study on Generalizability Theory

dc.contributor.authorAtilgan, Hakan
dc.date.accessioned2019-10-27T09:48:31Z
dc.date.available2019-10-27T09:48:31Z
dc.date.issued2019
dc.departmentEge Üniversitesien_US
dc.description.abstractPurpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students' essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with respect to the number of raters and optimal number of raters for obtaining optimal reliability of the rating of the writing ability of a student, which is considered to be an implicit trait as a whole and in its sub-dimensions of wording/writing, paragraph construction, and title selection, were determined. Research Methods: The student sample of the study comprised 443 students who were selected via random cluster sampling, and rater sample of this study comprised four Turkish teachers. All the essays written by the students in the sample were independently rated on a writing skill scale (WSS), which is an ordinal scale comprising 20 items, by four trained teachers. In this study, data analysis was performed using the multivariate p degrees x i degrees x r degrees design of the G theory. Finding: In the G studies that were performed, variances of the rater (r) as well as item and rater (ixr) were low in all sub-dimensions; however, variance of the object of measurement and rater (pxr) was relatively high. The presence of trained raters increased the reliability of the ratings. Implications for Research and Practice: In the decision (D) study analyses of the original study conducted using four raters, the G and Phi coefficients for the combined measurement were observed to be .95 and .94, respectively. Further, the G and Phi coefficients were .91 and .90, respectively, for the alternative D studies that were conducted by two trained raters. Thus, rating of essays by two trained raters may be considered to be satisfactory. (C) 2019 Ani Publishing Ltd. All rights reserveden_US
dc.identifier.doi10.14689/ejer.2019.80.7
dc.identifier.endpage150en_US
dc.identifier.issn1302-597X
dc.identifier.issn2528-8911
dc.identifier.issue80en_US
dc.identifier.startpage133en_US
dc.identifier.urihttps://doi.org/10.14689/ejer.2019.80.7
dc.identifier.urihttps://hdl.handle.net/11454/29536
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakTR-Dizinen_US
dc.language.isoenen_US
dc.publisherAni Yayinciliken_US
dc.relation.ispartofEurasian Journal of Educational Researchen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectGeneralizability Theoryen_US
dc.subjectgeneralizabilityen_US
dc.subjectreliabilityen_US
dc.subjectessay ratingen_US
dc.subjectessay rater reliabilityen_US
dc.subjectwriting ratingsen_US
dc.titleReliability of Essay Ratings: A Study on Generalizability Theoryen_US
dc.typeArticleen_US

Dosyalar