Research agenda: Priorities for future research in second language assessment

Stephen Stoynoff

doi:10.1017/S026144481100053X

Research agenda: Priorities for future research in second language assessment

Published online by Cambridge University Press: 24 February 2012

Stephen Stoynoff

Show author details

Stephen Stoynoff*: Affiliation:
Minnesota State University, Mankatostephen.stoynoff@mnsu.edu

Article contents

Extract
References

Get access

Rights & Permissions

Extract

In a recent state-of-the-art (SoA) article (Stoynoff 2009), I reviewed some of the trends in language assessment research and considered them in light of validation activities associated with four widely used international measures of L2 English ability. This Thinking Allowed article presents an opportunity to revisit the four broad areas of L2 assessment research (conceptualizations of the L2 construct, validation theory and practice, the application of technology to language assessment, and the consequences of assessment) discussed in the previous SoA and to propose tasks I believe will promote further advances in L2 assessment. Of course, the research tasks I suggest represent a personal stance and readers are encouraged to consider additional perspectives, including those expressed by Bachman (2000), Chalhoub-Deville & Deville (2005), McNamara & Roever (2006), Shaw & Weir (2007), and Stansfield (2008). Moreover, readers will find useful descriptions of current research approaches to investigating L2 assessments in Lumley & Brown (2005), Weir (2005a), Chapelle, Enright & Jamieson (2008), Lazaraton (2008), and Xi (2008).

Type: Thinking Allowed
Information: Language Teaching , Volume 45 , Issue 2 , April 2012 , pp. 234 - 249

DOI: https://doi.org/10.1017/S026144481100053X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alderson, J. C. (2006). Bridging the gap between theory and practice? Paper presented at the meeting of the European Association for Language Testing and Assessment. Krakow, Poland.Google Scholar

Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing 17.1, 1–42.CrossRef Google Scholar

Bachman, L. F. (2002). Some reflections on task-based language performance assessments. Language Testing 19.4, 453–476.CrossRef Google Scholar

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly 2.1, 1–34.CrossRef Google Scholar

Bachman, L. F. (2007). What is the construct? In Fox, J., Wesche, M., Bayliss, D., Cheng, L., Turner, C. & Doe, C. (eds.), Language testing reconsidered. Ottawa: University of Ottawa Press, 41–71.CrossRef Google Scholar

Bachman, L. & Palmer, A. (2010). Language assessment in practice. New York: Oxford University Press.Google Scholar

Brown, A., Iwashita, N., McNamara, T. & O'Hagan, S. (2005). An examination of rater orientations and test taker performance on English-for-academic-purposes speaking tasks (TOEFL Monograph 29). Princeton, NJ: Educational Testing Service.CrossRef Google Scholar

Brown, J. D. (2008). Testing-context analysis: Assessment is just another part of language curriculum development. Language Assessment Quarterly 5.4, 275–312.CrossRef Google Scholar

Chalhoub-Deville, M. (2009). The intersection of test impact, validation, and educational reform policy. Annual Review of Applied Linguistics 29, 118–131.CrossRef Google Scholar

Chalhoub-Deville, M. & Deville, C. (2005). A look back at and forward to what language testers measure. In Hinkel, E. (ed.), Handbook of research in second language teaching and learning. Mahwah, NJ: Erlbaum, 815–831.Google Scholar

Chalhoub-Deville, M. & Wigglesworth, G. (2005). Rater judgment and English language speaking proficiency. World Englishes 24.3, 383–391.CrossRef Google Scholar

Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics 19, 254–272.CrossRef Google Scholar

Chapelle, C. A. & Douglas, D. (2006). Assessing language through computer technology. Cambridge: Cambridge University Press.CrossRef Google Scholar

Chapelle, C. A., Enright, M. K. & Jamieson, J. M. (eds.) (2008). Building a validity argument for the Test of English as a Foreign Language. New York: Routledge.Google Scholar

Cheng, L. & Curtis, A. (2012). Test impact and washback: Implications for teaching and learning. In Coombe, C., Davidson, P., O'Sullivan, B. & Stoynoff, S. (eds.), The Cambridge guide to second language assessment. Cambridge: Cambridge University Press, 89–95.Google Scholar

Cheng, L. & Qi, L. (2006). Description and examination of the National Matriculation English Test. Language Assessment Quarterly 3.1, 53–70.CrossRef Google Scholar

Choi, I., Kim, K. & Boo, J. (2003). Comparability of a paper-based language test and a computer-based language test. Language Testing 20.3, 295–320.CrossRef Google Scholar

Cizek, G., Rosenberg, S. & Koons, H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement 68.3, 397–412.CrossRef Google Scholar

Cohen, A. & Upton, T. (2006). Strategies in responding to the new TOEFL reading tasks (TOEFL Monograph 33). Princeton, NJ: Educational Testing Service.CrossRef Google Scholar

Council of Europe (2003). Relating language examinations to the CEFR. Manual preliminary pilot version. Strasbourg: Council of Europe.Google Scholar

Cumming, A., Kantor, R., Baba, K., Erdosy, U. & James, M. (2006). Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for the next generation TOEFL (TOEFL Monograph 30). Princeton, NJ: Educational Testing Service.Google Scholar

Davidson, F. (2006). World Englishes and test construction. In Kachru, B., Kachru, Y. & Nelson, C. (eds.), The handbook of world Englishes. Oxford: Blackwell, 709–730.CrossRef Google Scholar

Davison, C. & Leung, C. (2009). Current issues in English language teacher-based assessment. TESOL Quarterly 43.3, 393–415.CrossRef Google Scholar

Douglas, D. & Hegelheimer, V. (2007). Assessing language using computer technology. Annual Review of Applied Linguistics 27, 115–132.CrossRef Google Scholar

Fulcher, G. (2004). Deluded by artifices? The Common European Framework and harmonization. Language Assessment Quarterly 1.4, 253–266.CrossRef Google Scholar

Galaczi, E. (2008). Peer-peer interaction in a speaking test: The case of the First Certificate in English examination. Language Assessment Quarterly 5.2, 89–119.CrossRef Google Scholar

Gan, Z. (2010). Interaction in group oral assessment: A case study of higher- and lower-scoring students. Language Testing 27.4, 585–602.CrossRef Google Scholar

Goodman, D. & Hambleton, R. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education 17.2, 145–220.CrossRef Google Scholar

Hamp-Lyons, L. (1997). Washback, impact, and validity: Ethical concerns. Language Testing 14.3, 295–303.CrossRef Google Scholar

Hamp-Lyons, L. & Davies, A. (2008). The English of English tests: Bias revisited. World Englishes 27.1, 26–39.CrossRef Google Scholar

Hawkey, R. (2006). Impact theory and practice. Cambridge: UCLES/Cambridge University Press.Google Scholar

Hulstijn, J. (2011). Language proficiency in native and non-native speakers: An agenda for research and suggestions for second-language assessment. Language Assessment Quarterly 8.3, 229–249.CrossRef Google Scholar

Kane, M. (2006). Validation. In Brennan, R. (ed.), Educational measurement (4th edn). Westport, CT: Praeger, 16–64.Google Scholar

Kane, M. (2010). Validity and fairness. Language Testing 27.2, 177–182.CrossRef Google Scholar

Kunnan, A. (2004). Test fairness. In Milanovic, M. & Weir, C. (eds.), European language testing in a global context. Cambridge: UCLES/Cambridge University Press, 27–48.Google Scholar

Kunnan, A. (2008). Towards a model of test evaluation: Using the test fairness and test context frameworks. In Taylor, L. & Weir, C. (eds.), Multilingualism and assessment. Cambridge: UCLES/Cambridge University Press, 229–251.Google Scholar

Lazaraton, A. (2002). A qualitative approach to the validation of oral language tests. Cambridge: UCLES/Cambridge University Press.Google Scholar

Lazaraton, A. (2008). Utilizing qualitative methods for assessment. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 197–209.Google Scholar

Leeson, H. (2006). The mode effect: A literature review of human and technological issues in computerized testing. International Journal of Testing 6.1, 1–24.CrossRef Google Scholar

Llosa, L. (2011). Standards-based classroom assessment of English proficiency: A review of issues, current developments, and future directions for research. Language Testing 28.3, 367–382.CrossRef Google Scholar

Little, D. (2011). The Common European Framework of Reference for Languages: A research agenda. Language Teaching 44.3, 381–393.CrossRef Google Scholar

Lumley, T. & Brown, A. (2005). Research methods in language testing. In Hinkel, E. (ed.) Handbook of research in second language teaching and learning. Mahwah, NJ: Erlbaum, 833–855.Google Scholar

Matsuno, S. (2009). Self-, pair-, and teacher-assessments in Japanese university EFL writing classrooms. Language Testing 26.1, 75–100.CrossRef Google Scholar

McKay, P. (2005). Research into the assessment of school-age language learners. Annual Review of Applied Linguistics 25, 243–263.CrossRef Google Scholar

McNamara, T. & Roever, C. (2006). Language testing: The social dimension. Ann Arbor, MI: Blackwell.Google Scholar

Mislevy, R., Steinberg, L. & Almond, R. (2002). Design and analysis in task-based language assessment. Language Testing 19.4, 477–496.CrossRef Google Scholar

North, B. (2008). CEFR levels and descriptor scales. In Taylor, L. & Weir, C. (eds.), Multilingualism and assessment. Cambridge: UCLES/Cambridge University Press, 21–66.Google Scholar

Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret scores. System 30.2, 143–154.CrossRef Google Scholar

Pardo-Ballester, C. (2010). The validity argument of a Web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly 7.2, 137–159.CrossRef Google Scholar

Purpura, J. (2008). Assessing communicative language ability: Models and their components. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 53–68.Google Scholar

Purpura, J. (2009).The impact of large-scale and classroom-based language assessments on the individual. In Taylor, L. & Weir, C. (eds.), Language testing matters. Cambridge: UCLES/Cambridge University Press, 301–325.Google Scholar

Rea-Dickens, P. (2008). Classroom-based assessment. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 257–271.Google Scholar

Sawaki, Y., Stricker, L. & Oranje, A. (2009). Factor structure of the TOEFL Internet-based test. Language Testing 26.1, 5–30.CrossRef Google Scholar

Shaw, S. & Weir, C. (2007). Examining writing: Research and practice in assessing second language writing. Cambridge: UCLES/Cambridge University Press.Google Scholar

Shih, C. (2008). The General English Proficiency Test. Language Assessment Quarterly 5.1, 63–76.CrossRef Google Scholar

Spies, R. & Plake, B. (eds.) (2005). The sixteenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements.Google Scholar

Stansfield, C. (2008). Where we have been and where we should go. Language Testing 25.3, 311–326.CrossRef Google Scholar

Stoynoff, S. (2009). Recent developments in language assessment and the case of four large-scale tests of ESOL ability. Language Teaching 42.1, 1–40.CrossRef Google Scholar

Stricker, L. & Rock, D. (2008). Factor structure of the TOEFL Internet-based test across subgroups (TOEFL iBT Research Report 07). Princeton, NJ: Educational Testing Service.CrossRef Google Scholar

Swain, M., Huang, L., Barkaoui, K., Brooks, L. & Lapkin, S. (2009). The speaking section of the TOEFL iBT: Test takers’ reported strategic behaviors (TOEFL iBT Research Report 10). Princeton, NJ: Educational Testing Service.Google Scholar

Taylor, L. (2008). Language varieties and their implications for testing and assessment. In Taylor, L. & Weir, C. (eds.), Multilingualism and assessment. Cambridge: UCLES/Cambridge University Press, 276–295.Google Scholar

Taylor, L. (2009). Setting language standards for teaching and assessment: A matter of principle, politics, or prejudice? In Taylor, L. & Weir, C. (eds.), Language testing matters. Cambridge: UCLES/Cambridge University Press, 139–157.Google Scholar

Taylor, L. & Wigglesworth, G. (2009). Are two heads better than one? Pair work in L2 assessment contexts. Language Testing 26.3, 325–339.CrossRef Google Scholar

Toulmin, S. E. (2003). The uses of argument (updated edition). Cambridge: Cambridge University Press.CrossRef Google Scholar

Wagner, E. (2007). Are they watching? An investigation of test taker viewing behavior during an L2 video listening test. Language Learning and Technology. http://llt.msu.edu/vol11num1/pdf/wagner.pdf.Google Scholar

Wagner, E. (2010). The effect of the use of video texts on ESL listening test taker performance. Language Testing 27.4, 493–514.CrossRef Google Scholar

Wall, D. & Horák, T. (2006). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 1, the baseline study (TOEFL Monograph 34). Princeton, NJ: Educational Testing Service.Google Scholar

Wall, D. & Horák, T. (2008). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 2, coping with change (TOEFL iBT Research Report 05). Princeton, NJ: Educational Testing Service.Google Scholar

Weir, C. (2005a). Language testing and validation: An evidenced-based approach. Basingstoke: Palgrave.CrossRef Google Scholar

Weir, C. (2005b). Limitations of the Common European Framework for developing comparable examinations and tests. Language Testing 22.3, 281–300.CrossRef Google Scholar

Weir, C., O'Sullivan, B., Yan, J. & Bax, S. (2007). Does the computer make a difference? The reaction of candidates to a computer-based versus a traditional handwritten form of the IELTS writing component: Effects and impact (IELTS Research Reports Volume 7, Number 6). Cambridge: Cambridge ESOL.Google Scholar

Wigglesworth, G. & Elder, C. (2010). An investigation of the effectiveness and validity of planning time in speaking test tasks. Language Assessment Quarterly 7.1, 1–24.CrossRef Google Scholar

Wise, S. & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education 18.2, 163–183.CrossRef Google Scholar

Wu, J. & Wu, R. (2010). Relating the GEPT reading comprehension tests to the CEFR. In Martyniuk, W. (ed.), Aligning tests with the CEFR. Cambridge: UCLES/Cambridge University Press, 204–224.Google Scholar

Xi, X. (2008). Methods of test validation. In Shohamy, E. & Hornberger, N. (eds.), Encyclopedia of language and education (2nd edn), Language testing and assessment (volume 7). New York: Springer, 177–196.Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing 27.2, 147–170.Google Scholar

Yu, G. (2010). Effects of presentation mode and computer familiarity on summarization of extended texts. Language Assessment Quarterly 7.2, 119–136.CrossRef Google Scholar

Article contents

Research agenda: Priorities for future research in second language assessment

Extract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests