Hierarchical statistical inference and lexical diffusion of

Hierarchical statistical inference and
lexical diffusion of sound change
Vsevolod Kapatsinski
University of Oregon
Two kinds of change in Usage-based Phonology
(Bybee 1976, 2001, 2002), Phillips (1984, 2001)
• Articulatorily-motivated sound change
– Driven by automatization of production (Browman & Goldstein 1992,
Bybee 2001, 2002, Kapatsinski 2010, Mowrey & Pagliuca 1995)
– Tempered by avoidance of misperception (Lindblom 1990)
– Starts in high-frequency words
• E.g., word-final t/d deletion (Bybee 2002), memory/mammary (Hooper 1976)
• But what about analogy to reduced words?
• Analogical change
– Driven by pressure to be like similar items (analogy) and
imperfect learning
– Starts with low-frequency words
• e.g., irregular past tenses are all in high-frequency verbs
Implementing the opposing mechanisms
• Reduction in use: Every time a word is used, it
– Words are assumed to be units of articulatory
planning and execution (Bybee 2001, 2002, Kapatsinski 2010)
– Or at least used in a reduction-favoring context (Bybee
2002, Raymond & Brown 2012)
• Learning: Words are associated with typical rates
of reduction (Bybee 2001, Erker & Guy 2012, Pierrehumbert 2001, 2002)
– “word-specific phonetics”
Is sound change in the phoneme or in
the word?
• It’s in both.
• Phonemes and words can be associated with
rates of reduction (cf. Phillips 2001).
• Ascribing blame for reduction to phoneme vs.
word is a process of hierarchical statistical
• We implement this using lme4 in R (Baayen et al.
Predicting reduction
• p(reduction) ~ β0 + β1*word
• β0 = overall probability of reduction (for this
• β1 = adjustment associated with individual word
A learning problem: Zipf (1949)
• Zipf’s law:
Erker & Guy (2012)
• In a corpus of any size, most words occur only once (Baayen
 For most words, we cannot estimate a word-specific
reduction coefficient with any certainty
The standard solution (mixed-effects):
Partial pooling / Word as a random effect
(Gelman & Hill 2007: 252-259)
Assume that the coefficients associated with individual words come from a certain
Allow coefficients to constrain each other: Coefficients that are too far off from the
center of the distribution are pulled in unless supported by much data
Weighted average of the mean of the tokens of the word ( ) and the mean of the
tokens of the phoneme/gesture ( )



σα 2 
σα 2
(Gelman & Hill 2007: 253)
Note:  is the number of tokens of the word: low-frequency word averages
are less reliable and get pulled in to the mean for the phoneme/gesture
The child as a mixed effects model
• Recall the theory:
– Every time you use a word, its reduction
probability is incremented
– When children acquire language, they acquire
word-specific and phoneme/gesture-specific
• They do not try to recover the coefficient associated
with word frequency
• This is equivalent to saying the child is modeling
reduction as an overall rate for a phoneme with a
random deflection for each word
Prior evidence: Erker & Guy (2012)
• Pronoun use with Spanish verbs
• Grammatical effects are augmented in high-frequency words
• As you would expect if within-word coefficients for
grammatical predictors are calculated by the learner
Partial pooling and word frequency
Prediction: At late stages of an articulatorilydriven change, exceptionally conservative words
are likely to have intermediate frequency of use.
When high-frequency words come to be associated
with reduced variant of the phoneme, lowfrequency words will be pulled in.
The U-shaped frequency effect
Generation 1
Generation 10
Possible case: Flapping
• t/d become flaps / V_V[-stress]
• Change is affecting a particular sublexical unit
• Lexical diffusion
• Far gone in American English
• Still variable
• Reading sentences with
– Words found mostly in colloquial speech (N=15)
• I found the bullshitter.
– Words found mostly in formal speech (N=15)
• I found the emitter.
– Non-words (N=15)
• I found the lenitter.
• Formality estimated using BNC:
– informal: conv, drama, interview_oral, spch w script & not;
– formal: parliament, academic, broadcast news, courtroom, public
– Britishisms eliminated from colloquial set by comparing SUBTLEX-US &
SUBTLEX-UK frequencies
• Prediction: nonwords should be in-between
I would prefer a letter.
I would prefer the latter.
I would prefer a gatter.
She is looking for the butter.
She is looking at the jitter.
She is looking at The Witter.
I found the bullshitter.
I found the emitter.
I found the lenitter.
She is going to get even madder.
She is going to find the highest bidder.
She is going to get even gadder!
The presenter is great at shutting up the
The presenter is great at stating the obvious.
The presenter is great at spating the audience.
That girl is so pretty!
The world needs this treaty!
The murl feeds the dretty.
He always tells dirty jokes!
He always puts duty first!
He always bicks puty off.
He bumped into his daddy.
He's read about that study.
He stood by the Gaddy.
How is everybody doing today?
How is the antibody doing that?
How is Don Abrimody doing now?
He found somebody online.
He came to embody this principle.
He came to Plembody today.
He is really into flirting with Jennifer.
He is really into rating the stimuli.
He is really into brating the blick.
She is really into her knitting.
She is really into the setting.
She is really into her mitting.
He is getting fatter.
They just have to scatter.
They are getting snatter.
Lots of fillers
They were asked to withdraw from Crimea. North Korea is changing. Who amongst
you have lived in Veneta? He had a lot of raw ability but too little patience. The store
is closed today. Your visa is only valid until June. His visa does not allow him to stay.
The Space Needle is Seattle's major landmark. David went to the opera in Seattle.
They won't change the law. The law is changing fast. Jamaica pays close attention
to Cuba's actions. One might observe that Iowa too is next to Illinois. Mike's idea is
excellent but we do not have the resources. South Asia has limited space for its
population. This government sees Canada as a tolerant multilingual society. Don't
touch the chainsaw when it is in operation. China is occupying a strategic location on
the east coast of Asia. China faces a huge task. Austin got a diploma in education.
Who invented the telephone? Mold is found in the basement. In his will, Bill will leave
the wheel to Will. Mood affects sentences we make. Mussels taste well with white
wine sauce. Some people find diamonds in the bushes. Researchers at the
University of Oregon find that saying sentences is healthy. Ben baked cookies. Each
day he wakes up and says a sentence. I have recently read a paper claiming this.
James washed his hands. Hugh could choose the hue of the hoop. You need to buy
the cat some diamonds. My cat talks funny. Why would one want the weasels to
win? Even the biggest elephant is not as big as a whale. Thanksgiving was fun. Blue
benches buzz loudly. Mashed potatoes can be eaten with chili sauce. Ken did not
find the gold. Cats like shiny things. The big bamboo stick is Mike's. Dogs can catch
mice and small marsupials. Can Ken can cantaloupes? The floor can't wash itself!
Participants and measures
• 40 (22 so far) adult native speakers of AmEng
Closure duration (rel. to preceding V)
Minimum intensity (rel. to max in next V)
Presence/absence of voicing
Presence/absence of burst
• lmer(log(ClosureDur)
~ spelling + underlyingPhoneme + Condition
+ (1+Condition|Triple)
data=data[ClosureDur>10 & ClosureDur<60 &
missing==“none“ & Triple!="bullshitter",])
Spelling single -0.093692
Colloquial wds -0.054557
Formal wds
Underlying t
Std. Err.
Non-words are in between colloquial and formal words
• It’s not words change or sounds change, it’s both.
– Not just words because of Zipf’s law
• Lexical diffusion of articulatorily-motivated sound
changes is predicted to affect high-frequency words
• But, once the change has progressed far enough,
– low-frequency words should succumb to it,
– with some mid-frequency words (e.g. ones used in formal
contexts) remaining exceptionally non-reduced
• Some evidence for this from flapping
Baayen, R. Harald. 2001. Word frequency distributions. Kluwer.
Baayen, R. H., D. J. Davidson, & D. M. Bates. 2008. Mixed-effects modeling with crossed random effects
for subjects and items. Journal of Memory & Language, 59, 390-412.
Browman, Catherine P., & Louis Goldstein. 1992. Articulatory phonology: An overview. Phonetica, 49,
Bybee, Joan L. 2002. Word frequency and context of use in the lexical diffusion of phonetically
conditioned sound change. Language Variation and Change, 14, 261-90.
Bybee, Joan L. 2001. Frequency and language use. Cambridge University Press.
Erker, Daniel, & Gregory Guy. 2012. The role of lexical frequency in syntactic variability: Variable subject
personal pronoun use in Spanish. Language, 88, 526-57.
Gelman, Andrew, & Jennifer Hill. 2007. Data analysis using regression and multilevel/hierarchical
models. Cambridge University Press.
Kapatsinski, Vsevolod. 2010. Frequency of use leads to automaticity of production: Evidence from repair
in conversation. Language & Speech, 53, 71-105.
Lindblom, Bjorn. 1990. Explaining phonetic variation: A sketch of the H&H theory. In Speech production
and speech modeling, ed. by William J. Hardcastle & A. Marchal, 403-39. Kluwer.
Phillips, Betty S. 2001. Lexical diffusion, lexical frequency, and lexical analysis. In Frequency and the
emergence of linguistic structure, ed. by Joan Bybee & Paul Hopper, 123-36. John Benjamins.
Phillips, Betty S. 1984. Word frequency and the actuation of sound change. Language, 60, 320-42.
Zipf, George K. 1949. Human behavior and the principle of least effort: An introduction to human
ecology. Addison-Wesley.

similar documents