Tekrarlı ölçümlerin analizinde Rasch Ölçme Modeli uygulaması: Rack ve Stack analizi

Yükleniyor...
Küçük Resim

Tarih

2021

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Ege Üniversitesi, Eğitim Bilimleri Enstitüsü

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Bu çalışmanın amacı tekrarlı ölçümlerin Rasch modeli ile analiz ederek kişilerde ve maddelerde meydana gelen değişimleri incelemektir. Bu amaçla tekrarlı ölçümlerin sonuçlarını incelemek için Rasch modelleri arasında Rack ve Stack analizleri seçilmiştir. Kişilerde meydana gelen değişimi ortaya çıkarmak için Stack analizi, maddelerdeki değişimi ortaya çıkarmak için Rack analizi kullanılmıştır. Bu çalışmada kullanılan veriler 15/11/2015 tarihinde başlayan, TÜBİTAK SOBAG 3501 programı tarafından desteklenen 115K531 numaralı “Uluslararası Geniş Ölçekli Sınavlarda Türkiye'nin Matematik Başarısını Arttırabilmek İçin Bir Model Önerisi: Bilişsel Tanıya Dayalı İzleme Modelinin Etkililiği” isimli projeden alınmıştır. Belirtilen TÜBİTAK projesinde ön test ve son test uygulanan gruptan geri bildirim verilerek manipüle edilen Deney-1 grubunda yer alan 1303 kişinin değişimi incelenmiştir. Araştırmada kullanılan veriler Winsteps programı ile analiz edilmiştir. Yapılan analiz doğrultusunda testin kullanılan test aracının Rasch modeline iyi uyum gösterdiği ve test aracının güvenilir ve geçerli olduğu sonucuna varılmıştır. Ölçme aracının Rasch modele ait tek boyutluluk ve yerel bağımsızlık varsayımlarını karşıladığı görülmüştür. Araştırmada meydana gelen değişimlerin negatif ve pozitif yönlü olarak incelenmiştir. Bu değişimlerin anlamlı olup olmadığına t istatistiği ile karar verilmiştir. Buna göre; 1303 kişiden oluşan örneklemde pozitif değişim gösteren 852 kişinin cevaplayıcıların %84,86’sı olan 723 kişinin değişimi istatistiksel olarak anlamlı olduğu, %15,14’ünün istatistiksel olarak anlamsız olduğu sonucuna erişilmiştir. Negatif değişim gösteren 341 kişinin 102’sinin değişimi anlamlı değildir dolayısıyla negatif değişim gösterenlerin %29,91’nin yetenek seviyesinde meydana gelen farklılık istatistiksel olarak anlamlı olmadığı, 239 kişinin yani negatif değişim gösteren cevaplayıcınların %70,09’unun istatistiksel olarak anlamlı olduğu sonucuna ulaşılmıştır. 231 kişinin değişimi istatistiksel olarak anlamlı değildir. 231 kişiden 129 kişi pozitif bir değişim gösterirken, 102 kişi negatif olarak değişim göstermiştir. Cevaplayıcıların %17,73’ünde meydana gelen değişim istatistiksel açıdan anlamlı olmadığı görülmüştür Dolayısıyla pozitif değişim gösterenler arasındaki 723 kişinin değişimi istatistiksel olarak anlamlıdır ve cevaplayıcıların %55,49’unu oluşturduğu görülmüştür. 30 maddenin tümünün maddelerin ön testinden elde edilen madde güçlük değerlerinin son testten elde edilen madde güçlük değerlerine göre büyük olduğu görülmüştür. 16 maddenin işlevinde meydana gelen değişim istatiksel olarak anlamlı değildir. Kalan 14 maddenin değişimi istatistiksel olarak anlamlıdır dolayısıyla maddelerin işlevlerinde bir değişiklik olmuştur. Maddelerde meydana gelen değişimin %46,7’si istatistiksel olarak anlamlı iken, %53,3’ü istatistiksel olarak anlamlı olmadığı sonucuna ulaşılmıştır.

INTRODUCTION Change is inevitable for everything that exists. Within time, subjects change from their current state to a another one. The change can be positive or negative, decreasing or increasing etc. Study on change poses significant challenges that must be met in a science field dedicated to understanding development, such as psychology. It is difficult to perceive these psychological structures, which structures are called attributes and they are considered to be implicit (Cohen & Swerdlik, 2013). Measurement instrument which has with proven validity and reliability are widely used to reveal such structures. Each attribute is unique to the individual, and these attributes change over time. Repeated measurements are often used to measure change. The term " repeated measurements" generally refers to data in which the response of each experimental unit or subject is observed in multiple situations or under multiple conditions (Davis, 2002).The same instrument is used in repeated measurements, the same individuals are expected to take the exam. There are always some problems with measuring change. There are many different statistical ways to reveal change with repeated measurements. These powerful statistical analyzes provide information on whether there has been a change and whether the intervention has been effective. Unfortunately, these statistics provide the researcher with rough information about the change. The traditional approach to the measurement of change presents important drawbacks. This statistics do not give us the opportunity to access the information of individuals, ordinal scores, variance of the measurement instrument across time points. The aim of this study is to show the usefulness of the Rasch model in showing the change over time. Rasch models overcome current drawbacks in the measurement of change. If the data fit the model, this study show that effects of these results using Rack and Stack analyzes. The main purpose of this study is to reveal the changes in the abilities of individuals and the function of test items. For this purpose, Rack analysis and Stack analysis methods are used in the study. Repeated measurements are to determine whether the function of test items and abilities of individuals have undergone any changes in the results obtained with the Rasch model. The problem sentence presented in this study is as follows: If the repeated measurements made in cases where the assumptions of the Rasch model are implemented, have there been any changes in the abilities of the individuals and the difficulty and function of the items? The sub-problems examined within the framework of the problem sentence are as follows: 1. Does the instrument meet the assumptions of the Rasch model? Does the data fit the model? 2. Is there a statistically significant difference between boys and girls in the items in the measurement tool? 3. Has there been a change in the function of the items in the instrument? Has there been a change in item difficulties? 4. Has there been a change in the ability levels of the examinees? METHODOLOGY In this study, the data of the project numbered 115K531 “A Recommended Model to Increase Success Level of Turkey in Mathematics in International Wide Scale Exams. Effectiveness of the Cognitive Diagnosis Based Tracking Model”, which started on 15/11/2015 and supported by the The Scientific and Technological Research Council of Turkey known as TÜBİTAK, were used. The project was carried out by Ege University Education Faculty Educational Sciences Department. The project was carried out using the experimental design. Secondary schools in İzmir were determined by the stratified method and were studied with 6th grade students. Among the different groups of the TÜBİTAK project, there are data obtained from the individuals in the Experiment-1 group and who were manipulated by giving feedback. In this study, the change of 1303 people in the Experiment-1 group, who were manipulated by giving feedback from the pre-test and post-test group in the TÜBITAK project, was examined. Experiment-1 group was given a detailed feedback form. This form includes detailed feedback on learning outcome, cognitive process and error according to Cognitive Diagnostic Models analysis. It is seen what characteristics the student has and how likely, what high-level skills he can demonstrate, and most importantly, what kind of mistakes he repeats. (Başokçu, 2015). The obtained data were analyzed by using Winsteps program according to Rasch model within the framework of item response theory (IRT). It was tested whether the data met the assumptions of the Rasch model. Stack analysis was used to examine the changes of abilities of individuals, and Rack analysis was used to examine the change of function of items. The instrument consisted of 30 questions. In data analysis, dichotomous items were used. According to the project report, 4 competency areas were specified in the scale used within the scope of the research, taking into account these competencies. PISA standards were utilized to determine the mathematical competencies in the this project. In addition, the project was rearranged considering the 6th grade level together with the researchers, consultants and field expert teachers. According to the project report, seven mathematical capabilities included in PISA were reclassified as four mathematical capabilities. The four mathematical capabilities are as follows; Communication and Attribution, Mathematization, Reasoning and Strategy Development, Use of Symbolic and Technical Language. In this study, unidimensionality and local independence assumptions were examined for the first sub-problem. In order to determine the unidimensionality assumption, the fit statistics were examined and the Principal Component Analysis Of The Residuals was made. The highest inter-item standard residual correlations were examined to determine whether the local independence assumption was met. The determination of these assumptions was made for the pre-test and post-test. In order to find an answer to the second sub-problem, the variation of item difficulties was examined. Rack analysis was performed to determine whether there was a change in the function of the items. In order to find an answer to the third sub-problem, the abilities levels of the respondents in the pre-test and post-test were examined. Stack analysis was performed to determine whether there was a change in these abilities levels or not. FINDINGS The psychometric properties of the pre-test and post-test were examined. The item reliability of the pre-test was 0.99 and the person reliability of the pre-test was 0.60. The Item Reliability is high, it is seen that person reliability is insufficient for the reason that it is lower than 0.70. According to these reliability results, it can be concluded that the test takers are not suitable for the test. Since the person discrimination was 1.22, the discrimination of the individuals was low, and the item discrimination index of the items was 12.38, so the discrimination of the items was high. It was concluded that the individuals were not separated from each other well enough in terms of their abilities. The INFIT value of the items is examined, it is seen that it is between 0.65 and 1.29. For an efficient analysis, the INFIT and OUTFIT values is expected to be 0.5 and 1.5, it was seen that the items provided this range. When the OUTFIT indices of the items are examined, it is seen that they vary between 0.62 and 1.72. When the OUTFIT statistics of item 12 with 1.72 are examined, it is seen that it is in an inefficient range for measurement. The fact that the item is in an inefficient range has been ignored. It was concluded that internal validity was achieved due to the fact that the INFIT and OUTFIT statistics of the items were within the required range. It was seen that the pre-test met the unidimensionality and local independence assumptions of the Rasch model. When the items are examined, the INFIT and OUTFIT values of the items are approximately 1, which is another proof that the unidimensionality assumption is met. The fact that the INFIT values of the items are in the required range is an evidence for internal validity. As a result, data model fit has been achieved. Therefore, it was concluded that the pre-test is a scale in which the rules of the Rasch model can be applied. The item reliability of the post-test was 0.99 and the person reliability of the pre-test was 0.84. Since the item reliability is higher than 0.70, it is concluded that the items measure the construct designed to measure the test, the items have enough questions to measure each ability range throughout the scale. Since the person reliability is higher than 0.70, the ability ranges of the individuals are ranked well throughout the scale. It is interpreted that the scores obtained from the scale are reliable. Since the person discrimination was 2.29, the discrimination of the individuals was low, and the item discrimination index of the items was 12.77, so the discrimination of the items was high. It was concluded that the test tool sample was sufficient to confirm the item difficulty hierarchy. It was seen that the applied post-test met the Rasch model's unidimensionality and local independence assumptions. When the items are examined, the INFIT and OUTFIT values of the items are approximately 1, which is another proof that the unidimensionality assumption is met. The concordance of the INFIT indices of the items is evidence for internal validity. As a result, data-model fit has been shown. Therefore, it has been concluded that the post-test is a scale to which Rasch model rules can be applied, since it substantiates the assumptions of the Rasch model. It was examined whether 30 test items in the test tool had a statistical bias for gender groups. It was seen that there was only item 10 with DIF contrast greater than 0.5 logit difference. In addition, since p ≤ 0.05, it was concluded that there is DIF. DIF exists because the probability values of Item 26 and Item 27 are less than 0.05. For these items, “this item has the same item difficulty for two groups.” which is Ho hypothesis is rejected. As a result, for these items, the item difficulties in the two groups are not the same. It was concluded that the 26th and 27th items were biased in benefit of female students, while the 10th item was biased for male students. It was observed that the item difficulty values obtained from the pre-test of all 30 items were higher than the item difficulty values obtained from the post-test. Whether these changes are meaningful or not is drawn with the graphic in the output of the Winsteps program. Items outside the 95% confidence interval; Items 5, 7, 8, 9, 11, 12, 14, 17, 18, 19, 21, 22, 23, 26, 27, and 28 . The t statistics of these items are outside the range of -1.96 and +1.96. The statistical changes related to the changes of these items are not significant. The change in the difficulty of these items is not statistically significant. The change in the function of these 16 items has no statistical significance. The change in the remaining 14 items is statistically significant, so there has been a change in the functions of the items. It was concluded that while 46.7% of the changes in the items were statistically significant, 53.3% of them were not statistically significant. When the data obtained were examined, it was seen that the changes in the skill levels of the people were as follows; It was concluded that the skill level of 852 people increased, the skill level of 341 people decreased and there was no change in the skill level of 110 people. It was seen that the highest increase in the skill level was seen in the 1228th person with 5.22, and the highest decrease in the skill level was in the 1253rd person with -3.13. There was an increase in the skill level of 65.38% of the respondents in the sample. 26.17% of them had a decrease in their ability level. In 8.44%, it was observed that there was no change in their abilities. Whether this change, which was estimated with a 95% confidence interval, was significant or not, was determined by the t statistic. The fact that this range is between -1.96 and 1.96 in the program means that the change in people is statistically significant. In the sample consisting of 1303 people, the change of 231 people is not statistically significant. While 129 people out of 231 showed positive change, 102 people showed negative change. The change in 17.73% of the respondents was not statistically significant, 129 out of 852 people who showed positive change did not show a statistically significant change, which made up 15.14% of the group and 102 out of 341 people who showed a negative change had a significant change. Therefore, it was concluded that the difference in the skill level of 29.91% of those who showed negative change was not statistically significant. The change of 723 people among those who showed a positive change is statistically significant and constitutes 55.49% of the sample. DISCUSSION AND CONCLUSION This study is the first domestic study on repeated measurements in the Rach model. As a result of the research, it has been seen that there are few studies on Rack and Stack analysis. It is thought that this study will lead to evaluate the situation of individuals in the fields of education and health. There is no study comparing classical statistical methods and Rasch models in repeated measurements. There is no study comparing classical statistical methods and Rasch models in repeated measurements. It is recommended that new researchers compare these two analysis methods and publish their results. It is thought that this study can be a guide for experimental methods wnd also thought that it will give researchers an idea about the effectiveness of their interventions, as it clearly reveals the effectiveness of the method studied and its difference at the individual level. Within the scope of this study, only Rack and Stack analysis was performed to obtain information about the ability level of individuals and the changes of the function of items. It was not measured whether the function of the test tool changed. It is suggested that new researchers should reveal whether the function of the scale used in their studies has changed by using Differential Test Function (DTF). It is recommended to use Rack and Stack analyzes in the Rasch model to get more detailed information in repeated measurements. When the studies conducted abroad are investigated, it is seen that these types of analysis are used in the field of health and productive results are obtained. It is thought that Stack analysis can be used as an effective way to examine the developments in the treatment methods of patients in domestic. Rasch models provide a valid framework for measuring change and are a useful complement to traditional approaches. There are other Rasch models that examine the measurement of change. It is recommended that new researchers include different Rasch models in their research. Future studies should compare different Rasch models in the repeated measurement.

Açıklama

Anahtar Kelimeler

Rasch Model, Rack ve Stack Analizi, Tekrarlı Ölçümler, Tekrarlı Veri, Değişimin Ölçülmesi, DMF, Rasch Model, Rack and Stack Analysis, Repeated Measurements, Repeated Data, Measurement of Change

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye