Hostname: page-component-76fb5796d-skm99 Total loading time: 0 Render date: 2024-04-25T09:34:49.466Z Has data issue: false hasContentIssue false

Research agenda: Priorities for future research in second language assessment

Published online by Cambridge University Press:  24 February 2012

Stephen Stoynoff*
Affiliation:
Minnesota State University, Mankatostephen.stoynoff@mnsu.edu

Extract

In a recent state-of-the-art (SoA) article (Stoynoff 2009), I reviewed some of the trends in language assessment research and considered them in light of validation activities associated with four widely used international measures of L2 English ability. This Thinking Allowed article presents an opportunity to revisit the four broad areas of L2 assessment research (conceptualizations of the L2 construct, validation theory and practice, the application of technology to language assessment, and the consequences of assessment) discussed in the previous SoA and to propose tasks I believe will promote further advances in L2 assessment. Of course, the research tasks I suggest represent a personal stance and readers are encouraged to consider additional perspectives, including those expressed by Bachman (2000), Chalhoub-Deville & Deville (2005), McNamara & Roever (2006), Shaw & Weir (2007), and Stansfield (2008). Moreover, readers will find useful descriptions of current research approaches to investigating L2 assessments in Lumley & Brown (2005), Weir (2005a), Chapelle, Enright & Jamieson (2008), Lazaraton (2008), and Xi (2008).

Type
Thinking Allowed
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alderson, J. C. (2006). Bridging the gap between theory and practice? Paper presented at the meeting of the European Association for Language Testing and Assessment. Krakow, Poland.Google Scholar
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing 17.1, 142.CrossRefGoogle Scholar
Bachman, L. F. (2002). Some reflections on task-based language performance assessments. Language Testing 19.4, 453476.CrossRefGoogle Scholar
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly 2.1, 134.CrossRefGoogle Scholar
Bachman, L. F. (2007). What is the construct? In Fox, J., Wesche, M., Bayliss, D., Cheng, L., Turner, C. & Doe, C. (eds.), Language testing reconsidered. Ottawa: University of Ottawa Press, 4171.CrossRefGoogle Scholar
Bachman, L. & Palmer, A. (2010). Language assessment in practice. New York: Oxford University Press.Google Scholar
Brown, A., Iwashita, N., McNamara, T. & O'Hagan, S. (2005). An examination of rater orientations and test taker performance on English-for-academic-purposes speaking tasks (TOEFL Monograph 29). Princeton, NJ: Educational Testing Service.CrossRefGoogle Scholar
Brown, J. D. (2008). Testing-context analysis: Assessment is just another part of language curriculum development. Language Assessment Quarterly 5.4, 275312.CrossRefGoogle Scholar
Chalhoub-Deville, M. (2009). The intersection of test impact, validation, and educational reform policy. Annual Review of Applied Linguistics 29, 118131.CrossRefGoogle Scholar
Chalhoub-Deville, M. & Deville, C. (2005). A look back at and forward to what language testers measure. In Hinkel, E. (ed.), Handbook of research in second language teaching and learning. Mahwah, NJ: Erlbaum, 815831.Google Scholar
Chalhoub-Deville, M. & Wigglesworth, G. (2005). Rater judgment and English language speaking proficiency. World Englishes 24.3, 383391.CrossRefGoogle Scholar
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics 19, 254272.CrossRefGoogle Scholar
Chapelle, C. A. & Douglas, D. (2006). Assessing language through computer technology. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Chapelle, C. A., Enright, M. K. & Jamieson, J. M. (eds.) (2008). Building a validity argument for the Test of English as a Foreign Language. New York: Routledge.Google Scholar
Cheng, L. & Curtis, A. (2012). Test impact and washback: Implications for teaching and learning. In Coombe, C., Davidson, P., O'Sullivan, B. & Stoynoff, S. (eds.), The Cambridge guide to second language assessment. Cambridge: Cambridge University Press, 8995.Google Scholar
Cheng, L. & Qi, L. (2006). Description and examination of the National Matriculation English Test. Language Assessment Quarterly 3.1, 5370.CrossRefGoogle Scholar
Choi, I., Kim, K. & Boo, J. (2003). Comparability of a paper-based language test and a computer-based language test. Language Testing 20.3, 295320.CrossRefGoogle Scholar
Cizek, G., Rosenberg, S. & Koons, H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement 68.3, 397412.CrossRefGoogle Scholar
Cohen, A. & Upton, T. (2006). Strategies in responding to the new TOEFL reading tasks (TOEFL Monograph 33). Princeton, NJ: Educational Testing Service.CrossRefGoogle Scholar
Council of Europe (2003). Relating language examinations to the CEFR. Manual preliminary pilot version. Strasbourg: Council of Europe.Google Scholar
Cumming, A., Kantor, R., Baba, K., Erdosy, U. & James, M. (2006). Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for the next generation TOEFL (TOEFL Monograph 30). Princeton, NJ: Educational Testing Service.Google Scholar
Davidson, F. (2006). World Englishes and test construction. In Kachru, B., Kachru, Y. & Nelson, C. (eds.), The handbook of world Englishes. Oxford: Blackwell, 709730.CrossRefGoogle Scholar
Davison, C. & Leung, C. (2009). Current issues in English language teacher-based assessment. TESOL Quarterly 43.3, 393415.CrossRefGoogle Scholar
Douglas, D. & Hegelheimer, V. (2007). Assessing language using computer technology. Annual Review of Applied Linguistics 27, 115132.CrossRefGoogle Scholar
Fulcher, G. (2004). Deluded by artifices? The Common European Framework and harmonization. Language Assessment Quarterly 1.4, 253266.CrossRefGoogle Scholar
Galaczi, E. (2008). Peer-peer interaction in a speaking test: The case of the First Certificate in English examination. Language Assessment Quarterly 5.2, 89119.CrossRefGoogle Scholar
Gan, Z. (2010). Interaction in group oral assessment: A case study of higher- and lower-scoring students. Language Testing 27.4, 585602.CrossRefGoogle Scholar
Goodman, D. & Hambleton, R. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education 17.2, 145220.CrossRefGoogle Scholar
Hamp-Lyons, L. (1997). Washback, impact, and validity: Ethical concerns. Language Testing 14.3, 295303.CrossRefGoogle Scholar
Hamp-Lyons, L. & Davies, A. (2008). The English of English tests: Bias revisited. World Englishes 27.1, 2639.CrossRefGoogle Scholar
Hawkey, R. (2006). Impact theory and practice. Cambridge: UCLES/Cambridge University Press.Google Scholar
Hulstijn, J. (2011). Language proficiency in native and non-native speakers: An agenda for research and suggestions for second-language assessment. Language Assessment Quarterly 8.3, 229249.CrossRefGoogle Scholar
Kane, M. (2006). Validation. In Brennan, R. (ed.), Educational measurement (4th edn). Westport, CT: Praeger, 1664.Google Scholar
Kane, M. (2010). Validity and fairness. Language Testing 27.2, 177182.CrossRefGoogle Scholar
Kunnan, A. (2004). Test fairness. In Milanovic, M. & Weir, C. (eds.), European language testing in a global context. Cambridge: UCLES/Cambridge University Press, 2748.Google Scholar
Kunnan, A. (2008). Towards a model of test evaluation: Using the test fairness and test context frameworks. In Taylor, L. & Weir, C. (eds.), Multilingualism and assessment. Cambridge: UCLES/Cambridge University Press, 229251.Google Scholar
Lazaraton, A. (2002). A qualitative approach to the validation of oral language tests. Cambridge: UCLES/Cambridge University Press.Google Scholar
Lazaraton, A. (2008). Utilizing qualitative methods for assessment. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 197209.Google Scholar
Leeson, H. (2006). The mode effect: A literature review of human and technological issues in computerized testing. International Journal of Testing 6.1, 124.CrossRefGoogle Scholar
Llosa, L. (2011). Standards-based classroom assessment of English proficiency: A review of issues, current developments, and future directions for research. Language Testing 28.3, 367382.CrossRefGoogle Scholar
Little, D. (2011). The Common European Framework of Reference for Languages: A research agenda. Language Teaching 44.3, 381393.CrossRefGoogle Scholar
Lumley, T. & Brown, A. (2005). Research methods in language testing. In Hinkel, E. (ed.) Handbook of research in second language teaching and learning. Mahwah, NJ: Erlbaum, 833855.Google Scholar
Matsuno, S. (2009). Self-, pair-, and teacher-assessments in Japanese university EFL writing classrooms. Language Testing 26.1, 75100.CrossRefGoogle Scholar
McKay, P. (2005). Research into the assessment of school-age language learners. Annual Review of Applied Linguistics 25, 243263.CrossRefGoogle Scholar
McNamara, T. & Roever, C. (2006). Language testing: The social dimension. Ann Arbor, MI: Blackwell.Google Scholar
Mislevy, R., Steinberg, L. & Almond, R. (2002). Design and analysis in task-based language assessment. Language Testing 19.4, 477496.CrossRefGoogle Scholar
North, B. (2008). CEFR levels and descriptor scales. In Taylor, L. & Weir, C. (eds.), Multilingualism and assessment. Cambridge: UCLES/Cambridge University Press, 2166.Google Scholar
Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret scores. System 30.2, 143154.CrossRefGoogle Scholar
Pardo-Ballester, C. (2010). The validity argument of a Web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly 7.2, 137159.CrossRefGoogle Scholar
Purpura, J. (2008). Assessing communicative language ability: Models and their components. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 5368.Google Scholar
Purpura, J. (2009).The impact of large-scale and classroom-based language assessments on the individual. In Taylor, L. & Weir, C. (eds.), Language testing matters. Cambridge: UCLES/Cambridge University Press, 301325.Google Scholar
Rea-Dickens, P. (2008). Classroom-based assessment. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 257271.Google Scholar
Sawaki, Y., Stricker, L. & Oranje, A. (2009). Factor structure of the TOEFL Internet-based test. Language Testing 26.1, 530.CrossRefGoogle Scholar
Shaw, S. & Weir, C. (2007). Examining writing: Research and practice in assessing second language writing. Cambridge: UCLES/Cambridge University Press.Google Scholar
Shih, C. (2008). The General English Proficiency Test. Language Assessment Quarterly 5.1, 6376.CrossRefGoogle Scholar
Spies, R. & Plake, B. (eds.) (2005). The sixteenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements.Google Scholar
Stansfield, C. (2008). Where we have been and where we should go. Language Testing 25.3, 311326.CrossRefGoogle Scholar
Stoynoff, S. (2009). Recent developments in language assessment and the case of four large-scale tests of ESOL ability. Language Teaching 42.1, 140.CrossRefGoogle Scholar
Stricker, L. & Rock, D. (2008). Factor structure of the TOEFL Internet-based test across subgroups (TOEFL iBT Research Report 07). Princeton, NJ: Educational Testing Service.CrossRefGoogle Scholar
Swain, M., Huang, L., Barkaoui, K., Brooks, L. & Lapkin, S. (2009). The speaking section of the TOEFL iBT: Test takers’ reported strategic behaviors (TOEFL iBT Research Report 10). Princeton, NJ: Educational Testing Service.Google Scholar
Taylor, L. (2008). Language varieties and their implications for testing and assessment. In Taylor, L. & Weir, C. (eds.), Multilingualism and assessment. Cambridge: UCLES/Cambridge University Press, 276295.Google Scholar
Taylor, L. (2009). Setting language standards for teaching and assessment: A matter of principle, politics, or prejudice? In Taylor, L. & Weir, C. (eds.), Language testing matters. Cambridge: UCLES/Cambridge University Press, 139157.Google Scholar
Taylor, L. & Wigglesworth, G. (2009). Are two heads better than one? Pair work in L2 assessment contexts. Language Testing 26.3, 325339.CrossRefGoogle Scholar
Toulmin, S. E. (2003). The uses of argument (updated edition). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Wagner, E. (2007). Are they watching? An investigation of test taker viewing behavior during an L2 video listening test. Language Learning and Technology. http://llt.msu.edu/vol11num1/pdf/wagner.pdf.Google Scholar
Wagner, E. (2010). The effect of the use of video texts on ESL listening test taker performance. Language Testing 27.4, 493514.CrossRefGoogle Scholar
Wall, D. & Horák, T. (2006). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 1, the baseline study (TOEFL Monograph 34). Princeton, NJ: Educational Testing Service.Google Scholar
Wall, D. & Horák, T. (2008). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 2, coping with change (TOEFL iBT Research Report 05). Princeton, NJ: Educational Testing Service.Google Scholar
Weir, C. (2005a). Language testing and validation: An evidenced-based approach. Basingstoke: Palgrave.CrossRefGoogle Scholar
Weir, C. (2005b). Limitations of the Common European Framework for developing comparable examinations and tests. Language Testing 22.3, 281300.CrossRefGoogle Scholar
Weir, C., O'Sullivan, B., Yan, J. & Bax, S. (2007). Does the computer make a difference? The reaction of candidates to a computer-based versus a traditional handwritten form of the IELTS writing component: Effects and impact (IELTS Research Reports Volume 7, Number 6). Cambridge: Cambridge ESOL.Google Scholar
Wigglesworth, G. & Elder, C. (2010). An investigation of the effectiveness and validity of planning time in speaking test tasks. Language Assessment Quarterly 7.1, 124.CrossRefGoogle Scholar
Wise, S. & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education 18.2, 163183.CrossRefGoogle Scholar
Wu, J. & Wu, R. (2010). Relating the GEPT reading comprehension tests to the CEFR. In Martyniuk, W. (ed.), Aligning tests with the CEFR. Cambridge: UCLES/Cambridge University Press, 204224.Google Scholar
Xi, X. (2008). Methods of test validation. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 177196.Google Scholar
Xi, X. (2010). How do we go about investigating test fairness? Language Testing 27.2, 147170.Google Scholar
Yu, G. (2010). Effects of presentation mode and computer familiarity on summarization of extended texts. Language Assessment Quarterly 7.2, 119136.CrossRefGoogle Scholar