Comparing Performance of Different Equating Methods in Presence and Absence of DIF Items in Anchor Test

Neşe Gübeş; Şeyma Uyar

doi:10.29329/ijpe.2020.248.8

Research article | Open Access
International Journal of Progressive Education 2020, Vol. 16(3) 111-122

Comparing Performance of Different Equating Methods in Presence and Absence of DIF Items in Anchor Test

Neşe Gübeş & Şeyma Uyar

pp. 111 - 122 | DOI: https://doi.org/10.29329/ijpe.2020.248.8

Publish Date: June 05, 2020 | Single/Total View: 220/949 | Single/Total Download: 301/1.710

PDF Download

Abstract

This study aims to compare the performance of different small sample equating methods in the presence and absence of differential item functioning (DIF) in common items. In this research, Tucker linear equating, Levine linear equating, unsmoothed and presmoothed (C=4) chained equipercentile equating, and simplified circle arc equating methods were considered. The data used in this study is 8th-grade mathematics test item responses which obtained from Trends in International Mathematics and Science Study (TIMSS) 2015 Turkey sample. Item responses from Booklet-1 (N=199) and Booklet-14 (N=224) are chosen for this study. Data analyses were completed in four steps. In the first step, assumptions for DIF detection and test equating methods were checked. In the second step, DIF analyses were conducted with Mantel Haenszel and logistic regression methods. In the third step, Booklet 1 was chosen as base form and Booklet 14 chosen as a new form, then test equating was conducted under common item nonequivalent groups design. Test equating was done in two phases: the presence and absence of DIF items in the common items. Equating results were evaluated based on standard error of equating (se), bias and RMSE indexes. DIF analyses showed that there were two sizeable DIF items in anchor test. Equating results showed that performances of equating methods are similar in presence and absence of DIF items from anchor test and there is no notable change in se, bias and RMSE values. While the circle arc equating method outperformed other equating methods based on se, 4-moment presmoothed chained equipercentile equating method outperformed other methods based on bias and RMSE evaluation criteria.

Keywords: Test Equating, Small Samples, Differential Item Functioning

How to Cite this Article?

APA 7th edition
Gubes, N., & Uyar, S. (2020). Comparing Performance of Different Equating Methods in Presence and Absence of DIF Items in Anchor Test. International Journal of Progressive Education, 16(3), 111-122. https://doi.org/10.29329/ijpe.2020.248.8

Harvard
Gubes, N. and Uyar, S. (2020). Comparing Performance of Different Equating Methods in Presence and Absence of DIF Items in Anchor Test. International Journal of Progressive Education, 16(3), pp. 111-122.

Chicago 16th edition
Gubes, Nese and Seyma Uyar (2020). "Comparing Performance of Different Equating Methods in Presence and Absence of DIF Items in Anchor Test". International Journal of Progressive Education 16 (3):111-122. https://doi.org/10.29329/ijpe.2020.248.8

References

Albano, A. (2017). equate: Observed –score linking and equating. [Computer software]. [Google Scholar]
Alexeev, N., Templin, J., & Cohen, A. (2011). Spurious latent classes in mixture rasch model. Journal of Educational Measurement, 48(3), 313-332. [Google Scholar]
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Council on Education. [Google Scholar]
Aşiret, S., & Sünbül, S. Ö. (2016). Investigating test equating methods in small samples through various factors. Kuram ve Uygulamada Eğitim Bilimleri, 16(2), 647-668. [Google Scholar]
Atalay-Kabasakal, K. & Kelecioğlu, H. (2015). Effect of differential item functioning on testequating.Educational Sciences: Theory & Practice, 15(5), 2015, 1229-1246. [Google Scholar]
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.). Washington, DC: American Council on Education. [Google Scholar]
Babcock, B., Albano, A., & Raymond, M. (2012). Nominal weights mean equating: A method for very small samples. Educational and Psychological Measurement, 72(4), 608-628. [Google Scholar]
Byrne, B. M. (2010). Structural equation modeling with AMOS, (2nd ed.). New York: Routledge. [Google Scholar]
Chu, K. L. (2002). Equivalent group test equating with the presence of differential item functioning (Doctoral dissertation). Available from ProOuest Dissertations and Theses database. [Google Scholar]
Cohen, A.S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning Journal of Educational Measurement, 42(2), 133-148. [Google Scholar]
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Mason, OH: Cengage Learning. [Google Scholar]
De Ayala, R.J., Kim, S.H., Stapleton, L.M., & Dayton, C.M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing,3(4), 243-276. [Google Scholar]
Demirus, K. B., & Gelbal, S. (2016). The study of the effect of anchor items showing or not showing differential item functioning to test equating using various methods. Journal of Measurement and Evaluation in Education and Psychology 7(1), 182-201. [Google Scholar]
Dorans, N. J. (1990). Equating methods and sampling designs. Applied Measurement in Education, 3(1), 3-17. [Google Scholar]
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NH: Erlbaum. [Google Scholar]
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating (ETS Research Report No. RR-10-29). Princeton, NJ: ETS. [Google Scholar]
Elosua, P., & Hambleton, R. K. (2018). Psychological and educational test scorecomparability across groups in the presence of item bias. Journal of Psychology and Education, 13(1), 23-32. [Google Scholar]
Fieuws, S., Spiessens, B., & Draney, K. (2004). Mixture models. In P. de Boeck & M. Wilson (Eds.), Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. (pp.317-340). New York: Springer. [Google Scholar]
Gierl, M., Khaliq, S. N., & Bougthon, K. (1999). Gender differential item functioning in mathematics and science: Prevalence and policy implications. Paper presented at the Improving large-scale assessment in education. Symposium conducted at the Annual Meeting of Canadian Society for the Study of Education, Canada. [Google Scholar]
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley Hidalgo-Montesinos, M. D., & Lopez-Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and the lord statistic. Educational and Psychological Measurement, 62(1), 32–44. [Google Scholar]
Hu, L. & Bentler, P. (1999). Cutoff criteria for fit indexes in covariance structure analysis Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. [Google Scholar]
Karasar, N. (2009). Bilimsel araştırma yöntemi: Kavramlar, ilkeler, teknikler. Ankara: Nobel Yayınları. [Google Scholar]
Kelecioğlu, H., & Öztürk Gübeş, N. (2013). Comparing linear equating and equipercentile equating methods using random groups design. International Online Journal of Educational Sciences, 5(1), 227-241. [Google Scholar]
Kim, S. & Livingston, S. A. (2010). Comparisons among small sample equating methods in a common item design. Journal of Educational Measurement, 47(3), 286-298. [Google Scholar]
Kline, R. (2005). Principles and practices of structural equation modeling (2n ed.). New York: Guilford Press. [Google Scholar]
Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and Practices. New York: Springer Verlag. [Google Scholar]
Kolen, M. J., & Brennan, R. J. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer-Verlag. [Google Scholar]
Kurtz, A. M., & Dwyer, A. C. (2013). Small sample equating: Best practices using a SAS Macro. Retrieved from http://analytics.ncsu.edu/sesug/2013/BtB-11.pdf [Google Scholar]
Li, F., Cohen, A. S., Kim, S.-H., & Cho, S.-J. (2009). Model selection methods for dichotomous mixture IRT models. Applied Psychological Measurement, 33(5), 353-373. doi: 10.1177/0146621608326422 [Google Scholar] [Crossref]
Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23-39. [Google Scholar]
Livingston, S. A., & Kim, S. (2008). Small sample equating by the circle-arc method (ETS Research Report No. RR-08-39). Princeton, NJ: ETS [Google Scholar]
Livingston, S. A., & Kim, S. (2009). The circle-arc method for equating in small samples. Journal of Educational Measurement, 46(3), 330–343. [Google Scholar]
Magis, D., Beland, S., & Raiche, G. (2015). difR: Collection of methods to detect dichotomous differential item functioning (DIF). [Computer software]. [Google Scholar]
Mclachlan, G. & Peel, D., (2000). Finite Mixture Models. John Wiley & Sons, Inc. New York. [Google Scholar]
Mislevy, R. J. & Norman, V. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195-215. [Google Scholar]
Muthén, L. K., & Muthén, B. O. (1998-2012). Mplus user's guide (7th ed.). Los Angeles, CA: Muthén & Muthén. [Google Scholar]
Oliveri, M. E., Ercikan, K. Zumbo, B. (2013). Analysis of Sources of Latent Class Differential Item Functioning in International Assessments. International Journal of Testing, 13(3), 272–293. doi: 10.1080/15305058.2012.738266 [Google Scholar] [Crossref]
Özdemir, B. (2017). Equating TIMSS mathematics subtests with nonlinear equating methods using NEAT design: circle-arc equating approaches. International Journal of Progressive Education, 13(2), 116-132. [Google Scholar]
Parshall, C. G., Du Bose, P., Houghton, P., & Kromrey, J. D. (1995). Equating error and statistical bias in small sample linear equating. Journal of Educational Measurement, 32(1), 37–54. [Google Scholar]
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271-282. doi: 10.1177/014662169001400305 [Google Scholar] [Crossref]
Samuelsen, K. M. (2005). Examining differential item functioning from a latent class perspective. (Doctoral dissertation, Faculty of Graduate School of the University of Maryland, College Park). Retrieved from https://drum.lib.umd.edu/bitstream/handle/1903/2682/umi-umd-2604.pdf?sequence=1&isAllowed=y [Google Scholar]
Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(4), 309–330. [Google Scholar]
Turhan, A. (2006). Multilevel 2PL item response model vertical equating with the presence of differential item functioning (Doctoral Dissertation). Available from ProOuest Dissertations and These database. [Google Scholar]
Von Davier, M. (2001). WINMIRA [Computer Software]. Groningen, the Netherlands: ASCAssessment Systems Corporation. USA and Science Plus Group. [Google Scholar]
Yurtçu, M. & Güzeller, C.O. (2018). Investigation of Equating Error in Tests with Differential Item Functioning. International Journal of Assessment Tools in Education, 5(1), 50-57. [Google Scholar]

Volume 16 Issue 3

June 2020

All Manuscript

Meta	Vol. 16 (3)
Download	Metric
History	Related