Hostname: page-component-7c8c6479df-nwzlb Total loading time: 0 Render date: 2024-03-17T02:49:52.960Z Has data issue: false hasContentIssue false

Models, forests, and trees of York English: Was/were variation as a case study for statistical practice

Published online by Cambridge University Press:  30 July 2012

Sali A. Tagliamonte
Affiliation:
University of Toronto
R. Harald Baayen
Affiliation:
University of Tübingen and University of Alberta

Abstract

What is the explanation for vigorous variation between was and were in plural existential constructions, and what is the optimal tool for analyzing it? Previous studies of this phenomenon have used the variable rule program, a generalized linear model; however, recent developments in statistics have introduced new tools, including mixed-effects models, random forests, and conditional inference trees that may open additional possibilities for data exploration, analysis, and interpretation. In a step-by-step demonstration, we show how this well-known variable benefits from these complementary techniques. Mixed-effects models provide a principled way of assessing the importance of random-effect factors such as the individuals in the sample. Random forests provide information about the importance of predictors, whether factorial or continuous, and do so also for unbalanced designs with high multicollinearity, cases for which the family of linear models is less appropriate. Conditional inference trees straightforwardly visualize how multiple predictors operate in tandem. Taken together, the results confirm that polarity, distance from verb to plural element, and the nature of the DP are significant predictors. Ongoing linguistic change and social reallocation via morphologization are operational. Furthermore, the results make predictions that can be tested in future research. We conclude that variationist research can be substantially enriched by an expanded tool kit.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Adger, David. (2006). Combinatorial variability. Journal of Linguistics 42(3):503530.Google Scholar
Adger, David, & Smith, Jennifer. (2005). Variation and the minimalist program. In Cornips, L. & Corrigan, K. P. (eds.), Syntax and variation: Reconciling the biological and the social. Amsterdam: John Benjamins. 149178.Google Scholar
Adger, David, & Smith, Jennifer. (2007). Language variability and syntactic theory. Berkeley: University of California Press.Google Scholar
Anderwald, Lieselotte. (2002). Negation in non-standard British English: Gaps, regularizations, and asymmetries. New York: Routledge.Google Scholar
Baayen, R. Harald. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.Google Scholar
Baayen, R. Harald. (2010). A real experiment is a factorial experiment? The Mental Lexicon 5:149157.Google Scholar
Baayen, R. Harald, Davidson, Douglas J., & Bates, Douglas M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59:390412.Google Scholar
Bates, Douglas M. (2005). Fitting linear mixed models in R. R News 5:2730.Google Scholar
Bates, Douglas, & Maechler, Martin. (2009). Lme4: Linear mixed-effects models using S4 classes. Available at: http://CRAN.R-project.org/package=lme4. R package version 0.999375-32.Google Scholar
Belsley, David A., Kuh, Edwin, & Welsch, Roy E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. Wiley Series in Probability and Mathematical Statistics. New York: Wiley.Google Scholar
Biberauer, Theresa, & Richards, Marc. (2008). True optionality: When the grammar doesn't mind. In Boeckx, Cedrix (ed.), Minimalist Essays, Amsterdam: John Benjamins. 3567.Google Scholar
Bickerton, Derek. (1971). Inherent variability and variable rules. Foundations of Language 7:457492.Google Scholar
Bickerton, Derek. (1973). On the nature of a creole continuum. Language 49:640669.Google Scholar
Bock, Karthryn, & Miller, Carol A. (1991). Broken agreement. Cognitive Psychology 23:4593.Google Scholar
Bock, Kathryn J., & Kroch, Anthony S. (1988). The isolability of syntactic processing. In Carlson, G. N. & Tannenhaus, M. K. (eds.), The isolability of syntactic processing: Linguistic structure in language processing. Dordrecht: Kluwer. 157196.Google Scholar
Börgars, Kersti, & Chapman, Carol. (1998). Agreement and pro-drop in some dialects of English. Linguistics 36:7198.Google Scholar
Breiman, Leo, Friedman, Jerome H., Olshen, Richard, & Stone, Charles J. (1984). Classification and regression trees. Belmont: Wadsworth International Group.Google Scholar
Breiman, Leo. (2001). Random forests. Machine Learning 45:532.Google Scholar
Britain, David. (2002). Diffusion, levelling, simplification and reallocation in past tense BE in the English Fens. Journal of Sociolinguistics 6(1):1643.Google Scholar
Britain, David, & Sudbury, Andrea. (2002). There's sheep and there's penguins; Convergence, ‘drift’ and ‘slant’ in New Zealand and Falkland Island English. In Jones, M. C. & Esch, E. (eds.), Language change; The interplay of internal, external and extra-linguistic factors. Berlin: Mouton de Gruyter. 211240.Google Scholar
Cedergren, Henriette J., & Sankoff, David. (1974). Variable rules: Performance as a statistical reflection of competence. Language 50(2):333355.Google Scholar
Chambers, Jack K. (1998). Social embedding of changes in progress. Journal of English Linguistics 26:536.Google Scholar
Chambers, Jack K. (2004). Dynamic typology and vernacular universals. In Kortmann, B. (ed.), Dialectology meets typology: Dialect grammar from a cross-linguistic perspective. Berlin: Mouton de Gruyter. 127145.Google Scholar
Cheshire, Jenny. (1982). Variation in an English dialect: A sociolinguistic study. Cambridge: Cambridge University Press.Google Scholar
Cheshire, Jenny, Edwards, Vivienne, & Whittle, Pamela. (1989). Urban British dialect grammar: The question of dialect levelling. English World-Wide 10(2):185225.Google Scholar
Cheshire, Jenny. (2005). Syntactic variation and beyond: Gender and social variation in the use of discourse-new markers. Journal of Sociolinguistics 9(4):479508.Google Scholar
Christian, Donna, Wolfram, Walt, & Dube, Nanjo. (1988). Variation and change in geographically isolated speech communities: Appalachian and Ozark English. Publication of the American Dialect Society 72.Google Scholar
Cornips, Leonie, & Corrigan, Karen P. (2005). Syntax and variation: Reconciling the biological and the social. Amsterdam: John Benjamins.Google Scholar
de Wolf, Gaelan Dodds. (1990). Social and regional differences in grammatical usage in Canadian English: Ottawa and Vancouver. American Speech 65:332.Google Scholar
Downes, William. (1984). Language and society. London: Fontana Press.Google Scholar
Eisikovits, Edina. (1991). Variation in subject-verb agreement in Inner Sydney English. In Cheshire, J. (ed.), English around the world: Sociolinguistic perspectives. Cambridge: Cambridge University Press. 235256.Google Scholar
Fasold, Ralph W. (1972). Tense marking in Black English: A linguistic and social analysis. Washington, D.C.: Center for Applied Linguistics.Google Scholar
Fasold, Ralph W. (1969). Tense and the form be in Black English. Language 45(4):763776.Google Scholar
Feagin, Crawford. (1979). Variation and change in Alabama English: A sociolinguistic study of the white community. Washington, D.C.: Georgetown University Press.Google Scholar
Gilmour, Arthur, Gogel, Beverly, Cullis, Brian, Welham, S. J., & Thompson, Robin. (2002). ASReml user guide, release 1.0. http://www.vsni.co.uk/resources/documentation/asreml-2-user-guide.Google Scholar
Guy, Gregory R. (1980). Variation in the group and the individual: The case of final stop deletion. In Labov, W. (ed.), Locating language in time and space. New York: Academic Press. 136.Google Scholar
Guy, Gregory R. (1988). Advanced VARBRUL analysis. In Ferrara, K., Brown, B., Walters, K., & Baugh, J. (eds.), Linguistic change and contact. Austin: University of Texas at Austin, Department of Linguistics. 124136.Google Scholar
Harrell, Frank E. (2001). Regression modeling strategies. Berlin: Springer.Google Scholar
Hay, Jennifer, & Schreier, Danny. (2004). Reversing the trajectory of language change: Subject–verb agreement with be in New Zealand English. Language Variation and Change 16(3):209235.Google Scholar
Hazen, Kirk. (1996). Dialect affinity and subject-verb concord: The Appalachian Outerbanks. SECOL Review 20:2553.Google Scholar
Henry, Alison. (1995). Belfast English and Standard English: Dialect variation and parameter setting. New York: Oxford University Press.Google Scholar
Henry, Alison. (1998). Parameter setting within a socially realistic linguistics. Language in Society 27:121.Google Scholar
Hothorn, Torsten, Buehlmann, Peter, Dudoit, Sandrine, Molinaro, Annette, & Van Der Laan, Mark. (2006). Survival ensembles. Biostatistics 7:355373.Google Scholar
Hothorn, Torsten, Hornik, Kurt, & Zeileis, Achim. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15:651674.Google Scholar
Jaeger, T. Florian. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language 59(4):434446.Google Scholar
Johnson, Daniel Ezra. (2009). Getting off the GoldVarb standard: Introducing Rbrul for mixed effects variable rule analysis. Language and Linguistics Compass 3:359383.Google Scholar
Joseph, Brian D., & Janda, Richard D. (1986). The how and why of diachronic morphologization and demorphologization. In Hammond, M. & Noonan, M. (eds.), Theoretical morphology. New York: Academic Press. 193210.Google Scholar
Joseph, Brian D., & Janda, Richard D. (2003). The handbook of historical linguistics. Oxford: Blackwell.Google Scholar
Kay, Paul. (1978). Variable rules, community grammar, and linguistic change. In Sankoff, D. (ed.), Linguistic variation: Models and methods. New York: Academic Press. 7183.Google Scholar
Kay, Paul, & McDaniel, Chad. (1979). On the logic of variable rules. Language in Society 8:151187.Google Scholar
Labov, William. (1969). Contraction, deletion, and inherent variability of the English copula. Language 45(4):715762.Google Scholar
Labov, William. (1972a). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.Google Scholar
Labov, William. (1972b). The social stratification of (r) in New York City. In Labov, W. (ed.), Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. 4369.Google Scholar
Labov, William, Cohen, Paul, Robins, Clarence, & Lewis, John. (1968). A study of the Non-Standard English Negro and Puerto-Rican speakers in New York City. Report on Cooperative Research Project 3288. New York: Columbia University.Google Scholar
Lavandera, Beatriz R. (1978). Where does the sociolinguistic variable stop? Language in Society 7(2):171183.Google Scholar
Meechan, Marjory, & Foley, Michelle. (1994). On resolving disagreement: Linguistic theory and variation—there's bridges. Language Variation and Change 6:6385.Google Scholar
Milroy, James, & Milroy, Lesley. (1993). Real English: The grammar of English dialects in the British Isles. New York: Longman.Google Scholar
Milsark, Gary L. (1977). Toward an explanation of certain peculiarities of the existential construction in English. Linguistic Analysis 3:131.Google Scholar
MLwiN. (2007). MLwiN 2.1. Bristol: University of Bristol, Centre for Multilevel Modeling. Available at:http://www.cmm.bristol.ac.uk/MLwiN/index.shtml.Google Scholar
Montgomery, Michael B. (1989). Exploring the roots of Appalachian English. English World-Wide 10:227278.Google Scholar
Nelder, John. (1975). Announcement by the Working Party on Statistical Computing: GLIM (Generalized Linear Interactive Modelling Program). Journal of the Royal Statistical Society. Series C (Applied Statistics) 24(2):259261.Google Scholar
Paolillo, John. (2002). Analyzing linguistic variation: Statistical models and methods. Stanford: CSLI Publications.Google Scholar
R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at: http://www.R-project.org.Google Scholar
Rand, David, & Sankoff, David. (1990). GoldVarb: A variable rule application for the Macintosh. Montreal: Centre de recherches mathématiques, Université de Montréal.Google Scholar
Rickford, John. (1975). Carrying the new wave into syntax: The case of Black English bin. In Fasold, R. & Shuy, R. (eds.), Analyzing variation in language. Washington, D.C.: Georgetown University Press.Google Scholar
Rousseau, Pascale, & Sankoff, David. (1978). Advances in variable rule methodology. In Sankoff, D.(ed.), Linguistic variation: Models and methods. New York: Academic Press. 5769.Google Scholar
Sankoff, David. (1978a). Linguistic variation: Models and methods. New York: Academic Press.Google Scholar
Sankoff, David. (1978b). Probability and linguistic variation. Synthèse 37:217238.Google Scholar
Sankoff, David. (1982). Sociolinguistic method and linguistic theory. In Cohen, L. Jonathan, Los, J., Pfeiffer, H. & Podewski, K. P. (eds.), Logic, methodology, philosophy of science VI. Amsterdam: North Holland. 677689.Google Scholar
Sankoff, David. (1985). Statistics in linguistics. In Encyclopaedia of the statistical sciences. New York: Wiley.Google Scholar
Sankoff, David. (1988). Sociolinguistics and syntactic variation. Linguistics: the Cambridge Survey 4: 140161.Google Scholar
Sankoff, David, & Laberge, Suzanne. (1978). The linguistic market and the statistical explanation of variability. In Sankoff, D. (ed.), Linguistic variation: Models and methods. New York: Academic Press. 239250.Google Scholar
Sankoff, David, & Labov, William. (1979). On the uses of variable rules. Language in Society 8:189222.Google Scholar
Sankoff, David, & Rousseau, Pascale. (1979). Categorical contexts and variable rules. In Jacobson, S.(ed.), Papers from the Scandinavian Symposium on Syntactic Variation, Stockholm, May 18–19, 1979. Stockholm: Almqvist and Wiksell. 722.Google Scholar
Sankoff, David, & Sankoff, Gillian. (1973). Sample survey methods and computer-assisted analysis in the study of grammatical variation. In Darnell, R. (ed.), Canadian languages in their social context. Edmonton: Linguistic Research Inc. 763.Google Scholar
Sankoff, David, Tagliamonte, Sali A., & Smith, Eric. (2005). Goldvarb X. Toronto: Department of Linguistics, University of Toronto. Available at: http://individual.utoronto.ca/tagliamonte/Goldvarb/GV_index.htm.Google Scholar
Sankoff, David, Tagliamonte, Sali A., & Smith, Eric. (2012). Goldvarb Lion. Toronto: Department of Linguistics, University of Toronto. Available at:http://individual.utoronto.ca/tagliamonte/goldvarb.htm.Google Scholar
Sankoff, Gillian. (2005). Cross-sectional and longitudinal studies in sociolinguistics. In Ammoon, U., Dittmar, N., Mattheier, K. J. & Trudgill, P. (eds.), International handbook of the science of language and society. Berlin: Mouton de Gruyter. 10031013.Google Scholar
Schilling-Estes, Natalie, & Wolfram, Walt. (1994). Convergent explanation and alternative regularization patterns: Were/weren't leveling in a vernacular English variety. Language Variation and Change 6:273302.Google Scholar
Schreier, Danny. (2002). Past be in Tristan da Cunha: The rise and fall of categoricality in language change. American Speech 77(1):70.Google Scholar
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, & Zeileis, Achim. (2008). Conditional variable importance for random forests. BMC Bioinformatics 9. Available at: http://www.biomedcentral.com/1471-2105/9/307.Google Scholar
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, & Hothorn, Torsten. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8. Available at: http://www.biomedcentral.com/1471-2105/8/25.Google Scholar
Strobl, Carolin, Malley, James, & Tutz, Gerhard. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods 14(4):323348.Google Scholar
Tagliamonte, Sali A. (1998). Was/were variation across the generations: View from the city of York. Language Variation and Change 10:153191.Google Scholar
Tagliamonte, Sali A. (2001). Come/came variation in English dialects. American Speech 76:4261.Google Scholar
Tagliamonte, Sali A. (2002a). Comparative sociolinguistics. In Chambers, J. K., Trudgill, P., & Schilling-Estes, N. (eds.), Handbook of language variation and change. Malden: Blackwell Publishers. 729763.Google Scholar
Tagliamonte, Sali A. (2002b). Variation and change in the British relative marker system. In Poussa, P. (ed.), Relativisation on the North Sea littoral. Munich: Lincom Europa. 147165.Google Scholar
Tagliamonte, Sali A. (2003). “Every place has a different toll”: Determinants of grammatical variation in cross-variety perspective. In Rhodenberg, G. & Mondorf, B. (eds.), Determinants of grammatical variation in English. Berlin: Mouton de Gruyter. 531554.Google Scholar
Tagliamonte, Sali A. (2006). Analysing sociolinguistic variation. Cambridge: Cambridge University Press.Google Scholar
Tagliamonte, Sali A. (2009). There was universals; then there weren't: A comparative sociolinguistic perspective on “default singulars.” In Fillpula, M., Klemola, J. & Paulasto, H. (eds.), Vernacular universals versus contact induced change. Oxford: Routledge. 103129.Google Scholar
Tagliamonte, Sali A., & Roeder, Rebecca V. (2009). Variation in the English definite article: Sociohistorical linguistic in t'speech community. Journal of Sociolinguistics 13:435471.Google Scholar
Tagliamonte, Sali A., & Smith, Jennifer. (1998). Analogical levelling in Samaná English: The case of was and were. Journal of English Linguistics 27:826.Google Scholar
Tagliamonte, Sali A., & Smith, Jennifer. (2000). Old was; new ecology: Viewing English through the sociolinguistic filter. In Poplack, S. (ed.), The English history of African American English. Oxford: Blackwell Publishers. 141171.Google Scholar
Tagliamonte, Sali A., & Smith, Jennifer. (2006). Layering, change and a twist of fate: Deontic modality in dialects of English. Diachronica 23:341380.Google Scholar
Trudgill, Peter J. (1990). The dialects of England. Oxford: Blackwell Publishers.Google Scholar
van de Velde, Hans, & van Hout, Roeland. (1998). Dangerous aggregations. a case study of Dutch (n) deletion. In Paradis, C. (ed.), Papers in sociolinguistics. Quebec: Nuits Blanches. 137147.Google Scholar
Venables, William N., & Ripley, Brian D. (2002). Modern applied statistics with S-Plus. 4th ed.New York: Springer.Google Scholar
Walker, James A. (2007). “There's bears back there”: Plural existentials and vernacular universals in (Quebec) English. English World-Wide 28(2):147166.Google Scholar
West, Brady T., Welch, Katheleen B., & Galecki, . (2007). Linear mixed models: A practical guide using statistical software. Boca Raton: Chapman & Hall/CRC Press.Google Scholar
Wolfram, Walt. (1969). A sociolinguistic description of Detroit Negro speech. Washington, DC: Center for Applied Linguistics.Google Scholar
Wolfram, Walt. (1993). Identifying and interpreting variables. In Preston, D. (ed.), American dialect research. Amsterdam: John Benjamins. 193221.Google Scholar
Wolfram, Walt, & Christian, Donna. (1976). Appalachian speech. Arlington: Center for Applied Linguistics.Google Scholar