Hierarchical statistical inference and lexical diffusion of sound change Vsevolod Kapatsinski University of Oregon Two kinds of change in Usage-based Phonology (Bybee 1976, 2001, 2002), Phillips (1984, 2001) • Articulatorily-motivated sound change – Driven by automatization of production (Browman & Goldstein 1992, Bybee 2001, 2002, Kapatsinski 2010, Mowrey & Pagliuca 1995) – Tempered by avoidance of misperception (Lindblom 1990) – Starts in high-frequency words • E.g., word-final t/d deletion (Bybee 2002), memory/mammary (Hooper 1976) • But what about analogy to reduced words? • Analogical change – Driven by pressure to be like similar items (analogy) and imperfect learning – Starts with low-frequency words • e.g., irregular past tenses are all in high-frequency verbs Implementing the opposing mechanisms • Reduction in use: Every time a word is used, it reduces – Words are assumed to be units of articulatory planning and execution (Bybee 2001, 2002, Kapatsinski 2010) – Or at least used in a reduction-favoring context (Bybee 2002, Raymond & Brown 2012) • Learning: Words are associated with typical rates of reduction (Bybee 2001, Erker & Guy 2012, Pierrehumbert 2001, 2002) – “word-specific phonetics” Is sound change in the phoneme or in the word? • It’s in both. • Phonemes and words can be associated with rates of reduction (cf. Phillips 2001). • Ascribing blame for reduction to phoneme vs. word is a process of hierarchical statistical inference. • We implement this using lme4 in R (Baayen et al. 2008) Predicting reduction • p(reduction) ~ β0 + β1*word • β0 = overall probability of reduction (for this gesture/phoneme) • β1 = adjustment associated with individual word A learning problem: Zipf (1949) • Zipf’s law: Erker & Guy (2012) • In a corpus of any size, most words occur only once (Baayen 2001) For most words, we cannot estimate a word-specific reduction coefficient with any certainty The standard solution (mixed-effects): Partial pooling / Word as a random effect (Gelman & Hill 2007: 252-259) • Assume that the coefficients associated with individual words come from a certain distribution • Allow coefficients to constrain each other: Coefficients that are too far off from the center of the distribution are pulled in unless supported by much data • Weighted average of the mean of the tokens of the word ( ) and the mean of the tokens of the phoneme/gesture ( ) σ 2 + σ 2+ 1 σα 2 1 σα 2 (Gelman & Hill 2007: 253) Note: is the number of tokens of the word: low-frequency word averages are less reliable and get pulled in to the mean for the phoneme/gesture The child as a mixed effects model • Recall the theory: – Every time you use a word, its reduction probability is incremented – When children acquire language, they acquire word-specific and phoneme/gesture-specific phonetics • They do not try to recover the coefficient associated with word frequency • This is equivalent to saying the child is modeling reduction as an overall rate for a phoneme with a random deflection for each word Prior evidence: Erker & Guy (2012) • Pronoun use with Spanish verbs • Grammatical effects are augmented in high-frequency words • As you would expect if within-word coefficients for grammatical predictors are calculated by the learner Partial pooling and word frequency Prediction: At late stages of an articulatorilydriven change, exceptionally conservative words are likely to have intermediate frequency of use. When high-frequency words come to be associated with reduced variant of the phoneme, lowfrequency words will be pulled in. The U-shaped frequency effect Generation 1 Generation 10 Possible case: Flapping • t/d become flaps / V_V[-stress] • Change is affecting a particular sublexical unit • Lexical diffusion • Far gone in American English • Still variable Experiment • Reading sentences with – Words found mostly in colloquial speech (N=15) • I found the bullshitter. – Words found mostly in formal speech (N=15) • I found the emitter. – Non-words (N=15) • I found the lenitter. • Formality estimated using BNC: – informal: conv, drama, interview_oral, spch w script & not; – formal: parliament, academic, broadcast news, courtroom, public debate – Britishisms eliminated from colloquial set by comparing SUBTLEX-US & SUBTLEX-UK frequencies • Prediction: nonwords should be in-between I would prefer a letter. I would prefer the latter. I would prefer a gatter. Triples She is looking for the butter. She is looking at the jitter. She is looking at The Witter. I found the bullshitter. I found the emitter. I found the lenitter. She is going to get even madder. She is going to find the highest bidder. She is going to get even gadder! The presenter is great at shutting up the audience. The presenter is great at stating the obvious. The presenter is great at spating the audience. That girl is so pretty! The world needs this treaty! The murl feeds the dretty. He always tells dirty jokes! He always puts duty first! He always bicks puty off. He bumped into his daddy. He's read about that study. He stood by the Gaddy. How is everybody doing today? How is the antibody doing that? How is Don Abrimody doing now? He found somebody online. He came to embody this principle. He came to Plembody today. He is really into flirting with Jennifer. He is really into rating the stimuli. He is really into brating the blick. She is really into her knitting. She is really into the setting. She is really into her mitting. He is getting fatter. They just have to scatter. They are getting snatter. Lots of fillers They were asked to withdraw from Crimea. North Korea is changing. Who amongst you have lived in Veneta? He had a lot of raw ability but too little patience. The store is closed today. Your visa is only valid until June. His visa does not allow him to stay. The Space Needle is Seattle's major landmark. David went to the opera in Seattle. They won't change the law. The law is changing fast. Jamaica pays close attention to Cuba's actions. One might observe that Iowa too is next to Illinois. Mike's idea is excellent but we do not have the resources. South Asia has limited space for its population. This government sees Canada as a tolerant multilingual society. Don't touch the chainsaw when it is in operation. China is occupying a strategic location on the east coast of Asia. China faces a huge task. Austin got a diploma in education. Who invented the telephone? Mold is found in the basement. In his will, Bill will leave the wheel to Will. Mood affects sentences we make. Mussels taste well with white wine sauce. Some people find diamonds in the bushes. Researchers at the University of Oregon find that saying sentences is healthy. Ben baked cookies. Each day he wakes up and says a sentence. I have recently read a paper claiming this. James washed his hands. Hugh could choose the hue of the hoop. You need to buy the cat some diamonds. My cat talks funny. Why would one want the weasels to win? Even the biggest elephant is not as big as a whale. Thanksgiving was fun. Blue benches buzz loudly. Mashed potatoes can be eaten with chili sauce. Ken did not find the gold. Cats like shiny things. The big bamboo stick is Mike's. Dogs can catch mice and small marsupials. Can Ken can cantaloupes? The floor can't wash itself! Participants and measures • 40 (22 so far) adult native speakers of AmEng • • • • Closure duration (rel. to preceding V) Minimum intensity (rel. to max in next V) Presence/absence of voicing Presence/absence of burst Analysis • lmer(log(ClosureDur) ~ spelling + underlyingPhoneme + Condition + (1+Condition|Triple) +(1+Condition|Subject), data=data[ClosureDur>10 & ClosureDur<60 & missing==“none“ & Triple!="bullshitter",]) Results Estimate Spelling single -0.093692 Colloquial wds -0.054557 Formal wds 0.045319 Underlying t -0.001644 Std. Err. 0.027071 0.024011 0.024676 0.025567 t -3.46 -2.27 1.84 -0.06 p .0003 .012 .032 .53 Non-words are in between colloquial and formal words Conclusion • It’s not words change or sounds change, it’s both. – Not just words because of Zipf’s law • Lexical diffusion of articulatorily-motivated sound changes is predicted to affect high-frequency words first • But, once the change has progressed far enough, – low-frequency words should succumb to it, – with some mid-frequency words (e.g. ones used in formal contexts) remaining exceptionally non-reduced • Some evidence for this from flapping References Baayen, R. Harald. 2001. Word frequency distributions. Kluwer. Baayen, R. H., D. J. Davidson, & D. M. Bates. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory & Language, 59, 390-412. Browman, Catherine P., & Louis Goldstein. 1992. Articulatory phonology: An overview. Phonetica, 49, 155-80. Bybee, Joan L. 2002. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change, 14, 261-90. Bybee, Joan L. 2001. Frequency and language use. Cambridge University Press. Erker, Daniel, & Gregory Guy. 2012. The role of lexical frequency in syntactic variability: Variable subject personal pronoun use in Spanish. Language, 88, 526-57. Gelman, Andrew, & Jennifer Hill. 2007. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press. Kapatsinski, Vsevolod. 2010. Frequency of use leads to automaticity of production: Evidence from repair in conversation. Language & Speech, 53, 71-105. Lindblom, Bjorn. 1990. Explaining phonetic variation: A sketch of the H&H theory. In Speech production and speech modeling, ed. by William J. Hardcastle & A. Marchal, 403-39. Kluwer. Phillips, Betty S. 2001. Lexical diffusion, lexical frequency, and lexical analysis. In Frequency and the emergence of linguistic structure, ed. by Joan Bybee & Paul Hopper, 123-36. John Benjamins. Phillips, Betty S. 1984. Word frequency and the actuation of sound change. Language, 60, 320-42. Zipf, George K. 1949. Human behavior and the principle of least effort: An introduction to human ecology. Addison-Wesley.