Parallels in the sequential organization of birdsong and human speech

Sainburg, Tim; Theilman, Brad; Thielk, Marvin; Gentner, Timothy Q.

doi:10.1038/s41467-019-11605-y

Cited by 75 publications

(151 citation statements)

References 55 publications

(90 reference statements)

Supporting

Mentioning

126

Contrasting

Order By: Relevance

“…Given the artificial nature of optogenetic stimulation, we wondered whether USVs elicited by 167 optogenetic activation of POA neurons were acoustically similar to the USVs that are normally 168 produced by mice during social interactions. To compare the acoustic features of 169 optogenetically-elicited USVs (opto-USVs) to those of USVs produced spontaneously to a 170 nearby female, we employed a recently described method using variational autoencoders (VAEs) (Goffinet, 2019;Sainburg et al, 2019). Briefly, the VAE is an unsupervised modeling 172 approach that uses spectrograms of vocalizations as inputs and from these data learns a pair of 173 probabilistic maps, an "encoder" and a "decoder," capable of compressing vocalizations into a 174 small number of latent features while attempting to preserve as much information as possible 175 ( Fig.…”

Section: Acoustic Characterization Of Usvs Elicited By Activation Of mentioning

confidence: 99%

Circuit and synaptic organization of forebrain-to-midbrain pathways that promote and suppress vocalization

Michael

Goffinet

Wang

et al. 2019

Preprint

View full text Add to dashboard Cite

9Animals vocalize only in certain behavioral contexts, but the circuits and synapses through 10 which forebrain neurons trigger or suppress vocalization remain unknown. Here we used 11 transsynaptic tracing to identify two populations of inhibitory neurons that lie upstream of 12 neurons in the periaqueductal gray that gate the production of ultrasonic vocalizations in mice 13 (i.e., PAG-USV neurons). Activating PAG-projecting neurons in the preoptic hypothalamus 14 (POAPAG neurons) elicited USV production in the absence of social cues. In contrast, activating 15 PAG-projecting neurons in the extended amygdala (EAPAG neurons) transiently suppressed USV 16 production without disrupting non-vocal social behavior. Optogenetics-assisted circuit mapping 17 in brain slices revealed that POAPAG neurons directly inhibit PAG interneurons, which in turn 18 inhibit PAG-USV neurons, whereas EAPAG neurons directly inhibit PAG-USV neurons. These 19 experiments identify two major forebrain inputs to the PAG that trigger and suppress 20 vocalization, respectively, while also establishing the synaptic mechanisms through which these 21 neurons exert opposing behavioral effects. 22 23

show abstract

Section: Acoustic Characterization Of Usvs Elicited By Activation Of mentioning

confidence: 99%

Circuit and synaptic organization of forebrain-to-midbrain pathways that promote and suppress vocalization

Michael

Goffinet

Wang

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Since Shannon's original work characterizing the sequential dependencies present in language, the structure underlying long-range information in language has been the subject of a great deal of interest in linguistics, statistical physics, cognitive science, and psychology [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Long-range information content refers to the dependencies between discrete elements (e.g., units of spoken or written language) that persist over long sequential distances spanning words, phrases, sentences, and discourse.…”

Section: Introductionmentioning

confidence: 99%

“…Using the responses of these participants, Shannon derived an upper bound on the information added by including each preceding letter in the sequence. More recent investigations compute statistical dependencies directly from language corpora using either correlation functions [3,4,7,8,10,12,13] or mutual information (MI) functions [2,5,6,14] between elements in a sequence. In both cases, sequential On average, as the distance between elements increases, statistical dependencies grow weaker.…”

Section: Introductionmentioning

confidence: 99%

Long-range sequential dependencies precede complex syntactic production in language acquisition

Sainburg¹,

Mai²,

Gentner³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

To convey meaning, human language relies on hierarchically organized, long-range relationships spanning words, phrases, sentences, and discourse. The strength of the relationships between sequentially ordered elements of language (e.g., phonemes, characters, words) decays following a power law as a function of sequential distance. To understand the origins of these relationships, we examined long-range statistical structure in the speech of human children at multiple developmental time points, along with non-linguistic behaviors in humans and phylogenetically distant species. Here we show that adult-like power-law statistical dependencies precede the production of hierarchically-organized linguistic structures, and thus cannot be driven solely by these structures. Moreover, we show that similar long-range relationships occur in diverse non-linguistic behaviors across species. We propose that the hierarchical organization of human language evolved to exploit pre-existing long-range structure present in much larger classes of non-linguistic behavior, and that the cognitive capacity to model long-range hierarchical relationships preceded language evolution. We call this the Statistical Scaffolding Hypothesis for language evolution.

show abstract

“…This data-driven approach is closely related to previous studies that have applied autoencoding to birdsong for purposes of generating spectrograms and interpolating syllables for use in playback experiments [38, 44]. Additionally, dimensionality reduction algorithms such as the UMAP [29] and t-SNE [27] algorithms we use here to visualize latent spaces have previously been applied to raw spectrograms of birdsong syllables to aid in syllable clustering [37] and to visualize juvenile song learning [25]. Here, by contrast, we use the VAE as a general-purpose tool for quantifying vocal behavior, with a focus on cross-species comparisons and assessing variability across groups, individuals, and experimental conditions.…”

Section: Discussionmentioning

confidence: 96%

Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires

Goffinet

Brudner

Mooney

2019

Preprint

View full text Add to dashboard Cite

Vocalization is an essential medium for social and sexual signaling in most birds and mammals. 1 Consequently, the analysis of vocal behavior is of great interest to fields such as neuroscience and 2 linguistics. A standard approach to analyzing vocalization involves segmenting the sound stream 3 into discrete vocal elements, calculating a number of handpicked acoustic features, and then using 4 the feature values for subsequent quantitative analysis. While this approach has proven powerful, 5 it suffers from several crucial limitations: First, handpicked acoustic features may miss important 6 dimensions of variability that are important for communicative function. Second, many analyses 7 assume vocalizations fall into discrete vocal categories, often without rigorous justification. Third, a 8 syllable-level analysis requires a consistent definition of syllable boundaries, which is often difficult 9 to maintain in practice and limits the sorts of structure one can find in the data. To address these 10 shortcomings, we apply a data-driven approach based on the variational autoencoder (VAE), an 11 unsupervised learning method, to the task of characterizing vocalizations in two model species: 12 the laboratory mouse (Mus musculus) and the zebra finch (Taeniopygia guttata). We find that the 13 VAE converges on a parsimonious representation of vocal behavior that outperforms handpicked 14 acoustic features on a variety of common analysis tasks, including representing acoustic similarity and 15 recovering a known effect of social context on birdsong. Additionally, we use our learned acoustic 16 features to argue against the widespread view that mouse ultrasonic vocalizations form discrete 17 syllable categories. Lastly, we present a novel "shotgun VAE" that can quantify moment-by-moment 18 variability in vocalizations. In all, we show that data-derived acoustic features confirm and extend 19 existing approaches while offering distinct advantages in several critical applications. 20 1 Introduction 21 Vocalization is an essential medium for social and sexual signaling in most birds and mammals, and also serves as a 22 natural substrate for language and music in humans. Consequently, the analysis of vocal behavior is of great interest to 23 ethologists, psychologists, linguists, and neuroscientists. A major goal of these various lines of enquiry is to develop 24 methods for quantitative analysis of vocal behavior, efforts that have resulted in several powerful methods that enable 25 the automatic or semi-automatic analysis of vocalizations. Key to this approach has been the existence of software 26 packages that calculate acoustic features for each syllable within a vocalization [4, 39, 40, 7, 6]. For example, Sound 27 Analysis Pro, focused on birdsong, calculates 14 features for each syllable, including duration, spectral entropy, and 28 goodness of pitch, and uses these as a basis for subsequent clustering and analysis [39]. More recently, MUPET and 29 DeepSqueak have applied a similar approach to mouse vocalizati...

show abstract

Parallels in the sequential organization of birdsong and human speech

Cited by 75 publications

References 55 publications

Circuit and synaptic organization of forebrain-to-midbrain pathways that promote and suppress vocalization

Circuit and synaptic organization of forebrain-to-midbrain pathways that promote and suppress vocalization

Long-range sequential dependencies precede complex syntactic production in language acquisition

Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires

Contact Info

Product

Resources

About