Intensity variation signals metric structure in child-directed poetic speech 

Ahren Fitzroy & Mara Breen, Mount Holyoke College

Poster presented at Society for Music Perception and Cognition Meeting, San Diego, CA (2017)
(for reprint, contact ahren.fitzroy@gmail.com)

Abstract:
Regular stress in music guides attention to important moments in a manner similar to that observed for word onsets during speech segmentation. Our overarching hypothesis is that metric structure in speech is useful for guiding children’s segmentation during early language learning. We investigated whether intensity variation in child-directed poetic speech signals metric structure similar to the way that it does in music. We modeled intensity variation in a corpus of productions of The Cat in the Hat (Dr. Seuss, 1957) using a metric accent model derived from music performance (Drake & Palmer, 1993). Using linear mixed-effects regression, we modeled the maximum intensity (dB) of each word as a function of metric strength. To isolate meter from linguistic properties known to affect intensity, we included control parameters for segment number, lexical frequency, repetition, word class, syntactic structure, and capitalization. Intensity increased with fewer phonemes, lower frequency, first mention, open class, syntactic boundary non-alignment, and capitalization. Consistent with the music performance model, metric structure further predicted word intensity: words aligned with beat one in a 6/8 metric structure (e.g., down in (A)*) were produced with the greatest intensity, and words aligned with beat four (e.g., fish) were produced with intensity less than beat one but greater than all others. Consistent with prior work showing intensity reduction for predictable speech, words aligned with beat four were reduced when they completed a couplet (fall). That speakers use intensity variation to signal metric structure is novel in the speech production literature, and demonstrates strong connections between hierarchical timing processes in speech and music.

(A) “Put5 me6 | down1!” said2 the3 fish4.
This5 is6 | no1 fun2 at3 all4!
Put5 me6 | down1!” said2 the3 fish4.
“I5 do6 | NOT1 wish2 to3 fall4.”

*Subscripts in (A) indicate beat number in 6/8 meter, pipes indicate measure boundaries.

Audio examples:
Synthesized Cat In the Hat excerpt, normal order


Synthesized Cat In the Hat excerpt, random order


References:
Astheimer, L. B., & Sanders, L. D. (2009). Listeners modulate temporally selective attention during natural speech processing. Biological Psychology, 80(1), 23–34. https://doi.org/10.1016/j.biopsycho.2008.01.015

Breen, M. (2017). Word durations in The Cat in the Hat are affected by metrical hierarchy and rhyme predictability. Talk presented at the 30th Annual CUNY Conference on Human Sentence Processing, Boston, MAhttp://tedlab.mit.edu/cuny_abstracts/378_Final_Manuscript.pdf

Drake, C., & Palmer, C. (1993). Accent Structures in Music Performance. Music Perception: An Interdisciplinary Journal10(3), 343–378. https://doi.org/10.2307/40285574

Dr. Seuss. (1957). The Cat in the Hat. New York, NY: Random House.

Fitzroy, A. B., & Sanders, L. D. (2015). Musical Meter Modulates the Allocation of Attention across Time. Journal of Cognitive Neuroscience27(12), 2339–2351. https://doi.org/10.1162/jocn_a_00862

Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The Beginnings of Word Segmentation in English-Learning Infants. Cognitive Psychology39(3–4), 159–207. https://doi.org/10.1006/cogp.1999.0716

Leong, V., & Goswami, U. (2015). Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech. PLOS ONE10(12), e0144411. https://doi.org/10.1371/journal.pone.0144411

Lieberman, P. (1963). Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech. Language and Speech, 6(3), 172–187. https://doi.org/10.1177/002383096300600306

Todd, N. (1985). A Model of Expressive Timing in Tonal Music. Music Perception: An Interdisciplinary Journal3(1), 33–57. https://doi.org/10.2307/40285321