Reliability of Essay Ratings: A Study on Generalizability Theory

Atilgan, Hakan

Reliability of Essay Ratings: A Study on Generalizability Theory

dc.contributor.author	Atilgan, Hakan
dc.date.accessioned	2019-10-27T09:48:31Z
dc.date.available	2019-10-27T09:48:31Z
dc.date.issued	2019
dc.department	Ege Üniversitesi	en_US
dc.description.abstract	Purpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students' essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with respect to the number of raters and optimal number of raters for obtaining optimal reliability of the rating of the writing ability of a student, which is considered to be an implicit trait as a whole and in its sub-dimensions of wording/writing, paragraph construction, and title selection, were determined. Research Methods: The student sample of the study comprised 443 students who were selected via random cluster sampling, and rater sample of this study comprised four Turkish teachers. All the essays written by the students in the sample were independently rated on a writing skill scale (WSS), which is an ordinal scale comprising 20 items, by four trained teachers. In this study, data analysis was performed using the multivariate p degrees x i degrees x r degrees design of the G theory. Finding: In the G studies that were performed, variances of the rater (r) as well as item and rater (ixr) were low in all sub-dimensions; however, variance of the object of measurement and rater (pxr) was relatively high. The presence of trained raters increased the reliability of the ratings. Implications for Research and Practice: In the decision (D) study analyses of the original study conducted using four raters, the G and Phi coefficients for the combined measurement were observed to be .95 and .94, respectively. Further, the G and Phi coefficients were .91 and .90, respectively, for the alternative D studies that were conducted by two trained raters. Thus, rating of essays by two trained raters may be considered to be satisfactory. (C) 2019 Ani Publishing Ltd. All rights reserved	en_US
dc.identifier.doi	10.14689/ejer.2019.80.7
dc.identifier.endpage	150	en_US
dc.identifier.issn	1302-597X
dc.identifier.issn	2528-8911
dc.identifier.issue	80	en_US
dc.identifier.startpage	133	en_US
dc.identifier.uri	https://doi.org/10.14689/ejer.2019.80.7
dc.identifier.uri	https://hdl.handle.net/11454/29536
dc.identifier.wosquality	N/A	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	TR-Dizin	en_US
dc.language.iso	en	en_US
dc.publisher	Ani Yayincilik	en_US
dc.relation.ispartof	Eurasian Journal of Educational Research	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Generalizability Theory	en_US
dc.subject	generalizability	en_US
dc.subject	reliability	en_US
dc.subject	essay rating	en_US
dc.subject	essay rater reliability	en_US
dc.subject	writing ratings	en_US
dc.title	Reliability of Essay Ratings: A Study on Generalizability Theory	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
TR-Dizin İndeksli Yayınlar Koleksiyonu

Reliability of Essay Ratings: A Study on Generalizability Theory

Dosyalar

Koleksiyon