Sayısal yetenek farklılaşmasının cinsiyete göre incelenmesi: Bir değişen madde fonksiyonu(dmf) çalışması
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Bu tez çalışmasında, Sayısal Yetenekteki farklılaşma cinsiyete göre incelenmiştir. Bunun için önce her bir alt testi 10'ar maddeden oluşan toplam 40 maddelik bir test oluşturuldu. Bu test yaklaşık olarak 400 kişilik 2017-2018 eğitim öğretim yılında 11. sınıfta okuyan öğrencilerden oluşan örnekleme uygulandı. Bu uygulama sonrasında madde güçlükleri ve ayırdedicilikleri hesaplanarak madde seçimi yapıldı. Güvenirlik ve geçerlilik çalışmaları yapılarak, ölçmek istediğimiz yapıları (Aritmetik Akıl Yürütme, Eşitlik Kurma, Geometri ve Görsel-Uzamsal Yetenek) en iyi ölçen 5'er madde seçilip, teste son şekli verildi. Teste son şekli verildikten sonra 20 maddeden oluşan sayısal yetenek testi 2018-2019 eğitim öğretim yılında Bornova Anadolu Lisesi, Cihat Kora Anadolu Lisesi, 15 Temmuz Şehitler Anadolu Lisesi, Atatürk Lisesi, Karşıyaka Lisesi, Atakent Anadolu Lisesi okullarının 11. Sınıfında okuyan 1382 öğrenciye uygulandı. Analizler gerekli elemeler yapıldıktan sonra 1036 kişilik örneklem büyüklüğü için gerçekleştirildi. Sayısal yetenekte cinsiyete göre farklılık gösteren maddeler tespit edildi. Değişen Madde Fonksiyonu(DMF) içeren maddelerde, bu sonuçlara göre erkek öğrencilerin lehine DMF içeren, eşitlik kurma-sayısal akıl yürütme alanında iki soru olduğu, kız öğrencilerin lehine DMF içeren geometri ve görsel uzamsal yetenek alanında üç madde olduğu bulunmuştur. Ayrıca oluşturulan SYT'nin Rasch modeli ile uyumu test edilerek raporlandı. SYT maddelerinden bir tanesi hariç diğer maddelerin modele iyi uyum sağladığı, data-model, kişi-model ve test-model uyumunun iyi olduğu, yerel bağımsızlık ve tek boyutluluk varsayımlarını karşıladığı görüldü.
It is common for women to have stronger verbal skills and better memory for events, words, objects, faces and activities. Men are also often thought to be successful in manipulating objects mentally and in numerical tasks based on visual presentations. Because grades and general test scores depend on many factors, psychologists tend to assess better defined cognitive skills to understand these gender differences. Preschool children were born with neither more nor less numerical ability, because girls and boys perform equally well in early cognitive skills related to numerical thinking and knowledge of surrounding objects. However, when school starts, these skills begin to differentiate between genders (Perry, 2015). Numerical reasoning; arithmetic, number meaning, deductive reasoning, mathematical literacy, numerical literacy, problem solving, contextualized mathematics, mathematical modeling and quantitative reasoning is a complex structure with many names and definitions (Mayes, Peterson, & Bonilla, 2013, p. 2nd). Reasoning skills and content form the basis for defining the structure of numerical reasoning for evaluation purposes in the world of mathematics. In measurement theory and practice, the conceptual parallel study by Messick et al. (Dwyer et al., 2003). The ability to think arithmetically is crucial to being a democratic citizen because it enables them to make informed decisions on complex national and international issues that affect their home, workplace and local communities. It is important at this stage to investigate whether there is a gender-based differentiation in numerical ability tests. In the process of choosing a profession and settling in institutions, the tests in the examinations of girls and boys should not be in favor of or against girls or boys. Item bias is that different groups respond to the same test item differently. These differences require discovery as they can shed light on both the test item and the experiences and backgrounds of the different groups taking the exam (Holland & Thayer, 1986a). Regardless of whether a test makes a student's cognitive ability a biased or unbiased criterion, the test is likely to make biased predictions. For example, an academic ability test may routinely overestimate men's future academic performance and underestimate women's. In this case, if this test is used to make academic choice or placement decisions, women will be disadvantaged. Differences in tests are examined on the basis of item and test. These studies are mostly carried out by Differential Item Functioning (DIF) analysis. It is common to make DIF analyzes based on Item Response Theory (IRT) (Murk & Davidshofer, 2005). The aim of this study is to investigate whether the items in the Arithmetic Ability Test for High School Grades 11 contain DIF by using Rasch model within the frame of item response theory (IRT). At this thesis study, the differentiation in quantitative ability between sexes is studied. For this, first of all, a test is formed that is totally 40 items each of which has sub-tests composed of 10 items. The test was applied to 400 sample students that were 11th class in 2017-2018 educational year. After the applicationitem selection was made by accounting item difficulties and item discrimination parameters. Studies for reliability and validity were made, 5 items from each that best measure the subject structures(arithmetical reasoning, forming equality, geometry, visual-spatial ability) were chosen and the final form of the test was decided. After that the test was applied to 1382 students who attend 11th class in 2018-2019 at Bornova Anadolu, Cihat Kora Anadolu, 15 Temmuz Şehitler Anadolu, Atatürk Lisesi, Karşıyaka Lisesi and Atakent Anadolu high schools. After making necessary eliminations, series of analysis were made for a sample of 1036 students. When the researches on this subject are examined, it is concluded that boys are more successful than the girls in the Numerical Ability tests. In this study, contrary to many previous studies, instead of using ready data such as PISA, TIMMS, OKS, SBS data, a quantitative ability test was developed. After the validity and reliability studies were carried out and item parameters were calculated, the test was applied to 11th grade high school students. The problem sentence presented in this study is as follows: Is there any differentiation in quantitative ability across the gender groups? The sub-problems examined within the framework of the problem sentence are as follows: 1. What are the psychometric properties of the Quantitative Ability Test (QAT) developed within the scope of this research? 2. Does QAT meet the assumptions of the Rasch Model? 3. Do QAT items contain differential item function (DIF) that varies across the gender groups? 2.Method The population of the study consists of students who are studying in İzmir and are in the 11th grade as of the 2018-2019 academic year. The sample selected for this study consisted of 11th grade students attending 1036 State Anatolian High Schools. While selecting the sample, students in Anatolian High Schools with a minimum percentage of 8% settling in Anatolian High Schools were preferred. The students who participated in the study answered 20 items in the quantitative ability test consisting of Arithmetic Reasoning, Equation, Geometry and Visual-Spatial Ability subtests. As a result, the data of the students' responses to this test were evaluated. The Quantitative Ability Test consists of four subtests and each subtest contains 5 items. The test consisted of 20 items and the students were given 40 minutes as the response time. The answers were evaluated as true false (1-0). The data were obtained from the answers given by the 11th grade students of some state Anatolian High Schools in Izmir to the Quantitative Ability Test. Before the actual application of the Quantitative Ability Test, a total of 40 items consisting of 10 items were created for each subtest. After this application, item difficulties and discrimination were calculated and item selection was made. The reliability and validity studies were conducted and the five items that best measure the quantitative ability (Arithmetic Reasoning, Equation, Geometry and Visual-Spatial Ability) that we want to measure were selected and the test was finalized. In the first sub-sentence of the study, the psychometric properties of the Quantitative Ability Test were applied to the sample consisting of 400 students of 11 th grade students. In order to finalize the test, after the data obtained as a result of the application, item difficulties and discrimination were calculated and item selection was made. Reliability and validity studies were conducted and 5 items that best measure the structures we want to measure were selected and the test was finalized. The final form of the test was applied to 1382 11th grade students. The data obtained were analyzed by using WINSTEPS program according to Rasch model within the framework of item response theory. In order to examine the second sub-problem of the research, it was investigated whether the data set meets the assumptions of Rasch model. In the investigation of the third sub-problem, DIF-containing substances were identified. 3. Results The Test1 application includes the Equality (top 10 questions) and Numerical Reasoning (11-20 questions) subtests. In Test 2 application, Visual-Spatial Ability (top 10 questions) and Geometry (Questions 11-20) tests are available. Item analyzes of Test 1 and Test 2 were performed in the exel program in order to determine the items that could be put directly into the test and to eliminate the substances that could not be tested. According to this; of the 20 items in Test1; item difficulties were selected from 0.35 to 0.77, item discrimination between 0.36 and 0.65 and item standard deviations between 0.42 and 0.50 were selected (2, 3, 4, 6, 7, 11, 17, 18, 19, and 20. items). Of the 20 items in Test2, which consisted of Geometry and Visuospatial Ability Questions, 10 items were selected, ranging from 0.27 to 0.84, difficulty in distinguishing between 0.31 and 0.51, and standard deviations between 0.36 and 0.50 (1, 2, 5, 9). , 10, 11, 13, 16, 18, and 20). When reliability and validity studies were performed, the Kr-20 reliability coefficient of Test1 was 0.653 and the reliability coefficient of Test2 was 0.597. As a result, 5 items in each subtest that best measures the structures we want to measure (Arithmetic Reasoning, Equation, Geometry and Visual-Spatial Ability) were selected and the test was finalized. The reliability coefficient of the final test was found to be 0.61, item difficulty average was 0.411 and item discrimination average was 0.286. The observed correlation > Expected correlation has high discrimination power for high and low gifted students. According to the analysis, 15, 5, 16, 6, 7, 20.8., 19., 9. And 10th items have high discrimination power. The observed correlation ˂ expected correlation has low discrimination power for high and low gifted students. At the end of the analyzes, 17th, 4th, 14th, 12th, 13th, 11th, 1st, 2nd. The items were found to have low discrimination power. When we look at the INFIT and OUTFIT meansquare statistics, the most incompatible item is item 17 (infit mnsq = 1.37 and outfit mnsq = 1.65). When the infit and outfit mnsq values of all other items in the scale were examined, it was seen that all of them were close to 1. In other words, the items were in good fit with the model. Looking at the expected item characteristic curves of item 9 and item 10, it is seen that although item fit statistics (outfit mnsq values) are close to 1, these items are easier for students with low talent than expected and are more difficult for high-skilled students than expected. When we look at the INFIT and OUTFIT fit statistics for the students, it is seen that it is 1.00 and .99 respectively. The fact that these statistics are close to 1 indicates perfect fit. Standard deviation model expectation is reported as .95. Standard deviation of infit mnsq and outfit mnsq values are .17 and .50, respectively. When we look at the INFIT and OUTFIT statistics for the whole test, it is seen that 1.00 and .99 respectively. Standard deviation model expectation is reported as .86 (close to 1). Standard deviation of infit mnsq and outfit mnsq values are .11 and .20, respectively. If we want to find out if the item is biased in favor of or against a group, we look at the value of the DIF score. If the DIF score value is positive, item bias is in favor of that group, if negative, item bias is against that group (p˂.05). When we look at the DIF score values, the items with positive value for men are 1,2,7,9,10,12 and 16th items. These items appear to be biased in favor of men. For males, items with a negative DIF score are 3,4,5,6,11,15,17,18,19,20. these items appear to be biased against men. For more information, check the probability (p) values. According to these values, the results that test the hypothesis H0 “This item does not contain DIF statistically” are evaluated. The hypothesis H0 was rejected for items 7,9,15,17 and 20 (p˂.05). Items 7 and 9 show bias in favor of boys, while items 15, 17 and 20 show bias in favor of girls. However, this H0 hypothesis was accepted for other items (p> .05). When the unidimensionality of the quantitative ability test is examined, the ratio of the raw variance eigenvalue explained by the model to the unexplained raw variance eigenvalue in 1. Contrast is found as 7.0636 / 2.0705 = 3.4115. Since this ratio is greater than 3, it shows that the test measures a single implicit feature, namely quantitative ability. It was concluded that the Quantitative Ability Test was unidimensional. In order to evaluate the local independence assumption, the highest inter-item standard residual correlations are examined. It is said that if the correlation between the items is less than |.30|, the assumption of local independence is provided (Bruin, 2012). Only the residual correlation between items 9-10 was found to be 0.55.Since this value is greater than |.30 |, this value violates the assumption of local independence. Since residual correlations between other items are lower than | .30 |, local independence assumption is provided for these items. Since the correlation is greater than the value |.30| between the 9th and 10th items, the assumption of local independence of the test is met (Bruin, 2012). 4. Discussion and Conclusion Accordingly, the Arithmetic reasoning and equality of the sub-test consisting of 20 items in Test1'de 20 items; item difficulties were selected from 0.35 to 0.77, item discrimination between 0.36 and 0.65 and item standard deviations between 0.42 and 0.50. Of the 20 items in Test2, which consisted of Geometry and Visuospatial Ability Questions, 10 items were selected with item difficulty ranging from 0.27 to 0.84, discrimination between 0.31 and 0.51, and item standard deviations between 0.36 and 0.50. When reliability and validity studies were performed, the Kr-20 reliability coefficient of Test1 was 0.653 and the reliability coefficient of Test2 was 0.597. As a result, 5 items in each subtest that best measures the structures we want to measure (Arithmetic Reasoning, Equation, Geometry and Visual-Spatial Ability) were selected and the test was finalized. After finalizing the test, the 20-item quantitative ability test was applied to 1382 students in the 11th grade of Bornova Anatolian High School, Cihat Kora Anatolian High School, 15 Temmuz Şehitler Anatolian High School, Atatürk High School, Karşıyaka High School and Atakent Anatolian High School. After screening, 1036 students were analyzed and reported. In the Rasch analysis, the outfit mnsq value was 0.99 and the infit mnsq value was 1.00, which shows that the person-model fit was excellent. The reliability coefficient of the final test was found to be 0.61, item difficulty average was 0.411 and item discrimination average was 0.286. Although the reliability coefficient seems to be low, in the Rasch analysis, test reliability was found to be 0.99 in 1036 students. When the sample size increased, the overall reliability of the test increased. As a result of Rasch analysis, data-model fit, person-model fit, test model fit, DIF research, local independence and unidimensionality assumptions were checked. In Özalp Ateş (2015) 's study of the construct validity of the scales, Rasch Analysis suggested that the internal construct validity was also evaluated in terms of different characteristics such as item fit, person fit and item bias compared to factor analysis methods examining only the dimension structure of the scale (Özalp Ateş , 2015). The findings in this study were based solely on Rasch analysis and other methods such as factor analysis were not used during the test development stage. According to the findings 15th, 5th, 16th, 6th, 7th, 20th, 8th, 19th, 9th. and 10) have high discrimination power. 17, 4, 14, 12, 13, 11, 1, 2. items have low discrimination power. When the INFIT and OUTFIT meansquare statistics are examined, it is seen that the most incompatible item is item 17 (infit mnsq = 1.37 and outfit mnsq = 1.65). It is recommended to remove this item from the scale. When the infit and outfit mnsq values of all other items in the scale were examined, it was seen that all of them were close to 1. In other words, the items were in good fit with the model. In addition, the expected and observed matching percentages were examined to examine the data model fit. When these percentages are equal, the data is said to be fit with the model. Except item17., the expected and observed matching percentages of other items are very close to each other or equal. Looking at the expected item characteristic curves of item 9 and item 10, it is seen that although item fit statistics (outfit mnsq values) are close to 1, these items are easier for students with low talent than expected and are more difficult for high-skilled students than expected. In other words, only the expected and observed percentages of matching are insufficient to comment on the data-model fit. For healthier decisions, item characteristic curves should also be examined. Looking at the INFIT and OUTFIT statistics for the overall test, it was found to be 1.00 and .99, respectively, and the standard deviation model expectation was reported as .86 (close to 1). The standard deviation infit mnsq and outfit mnsq values were .11 and .20, respectively, suggesting good test-data fit. When the unidimensional feature of the quantitative ability test was examined, the ratio of the raw variance eigenvalue explained by the model to the unexplained raw variance eigenvalue in the 1st Contrast was found to be 7.0636 / 2.0705 = 3.4115. Since this ratio is greater than 3, it shows that the test measures a single implicit feature, namely quantitative ability. It was concluded that the Quantitative Ability Test was unidimensional. In order to evaluate the assumption of local independence, the correlation of residuals between items was examined and only the residual correlation between items 9-10 was found to be 0.55. This can be avoided by eliminating pairs (9-10) of the items that violate the local independence assumption or by combining these items and testing them as a single item. Finally, it should be ensured that DIF studies are routinely performed for all national exams within the scope of validity studies of tests.