A Multimedia Corpus of Child Mandarin: The Tong Corpus

Deng, Xiangjun; Yip, Virginia

doi:10.1353/jcl.2018.0002

Cited by 29 publications

(8 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The Adam and Eve sections from the Brown Corpus are then used to evaluate the depthbounded model defined in Section 4. Transcribed child-directed speech data in Chinese Mandarin (Tong; Deng et al 2018) and German (Leo;Behrens 2006) are also collected from the CHILDES corpus with reference trees automatically generated using the stateof-the-art Kitaev and Klein (2018) supervised parser trained with the Chinese (Xia et al 2000; The Chinese Treebank) and German (Skut et al 1998;NEGRA) treebanks. They are used as held-out data sets for the bounded grammar induction experiments, using cross-linguistic hyperparameters tuned on English.…”

Section: Experiments 3: Evaluation Of Bounded Pcfg Induction On Child-mentioning

confidence: 99%

Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition

Jin

Schwartz

Doshi‐Velez

et al. 2021

Computational Linguistics

View full text Add to dashboard Cite

This article describes a simple PCFG induction model with a fixed category domain that predicts a large majority of attested constituent boundaries, and predicts labels consistent with nearly half of attested constituent labels on a standard evaluation data set of child-directed speech. The article then explores the idea that the difference between simple grammars exhibited by child learners and fully recursive grammars exhibited by adult learners may be an effect of increasing working memory capacity, where the shallow grammars are constrained images of the recursive grammars. An implementation of these memory bounds as limits on center embedding in a depth-specific transform of a recursive grammar yields a significant improvement over an equivalent but unbounded baseline, suggesting that this arrangement may indeed confer a learning advantage.

show abstract

Section: Experiments 3: Evaluation Of Bounded Pcfg Induction On Child-mentioning

confidence: 99%

Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition

Jin

Schwartz

Doshi‐Velez

et al. 2021

Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…The data came from two boys and one girl whose language was regularly recorded, transcribed, and accessed for the present research from the CHILDES repository; naturalistic language data gathered by primary investigators are deposited in CHILDES to enable analysis by other researchers. We included Tong, a boy described by Xiangjun and Yip (2018) in their language acquisition research. Tong was raised in Shenzhen where both Mandarin and Cantonese are spoken.…”

Section: Participantsmentioning

confidence: 99%

Talking about people who are not there: Children’s early references to absent caregivers and absent friends

Zeng

Harris

2022

First Language

View full text Add to dashboard Cite

Research on the development of children’s decontextualized language has focused primarily on their references to events displaced in time. Here, we examine children’s early emerging ability to talk about individuals who are elsewhere and therefore not participating in the conversation. We analyzed the references made by three Mandarin-speaking children aged 20–40 months to absent members of their social network. Even in the earliest period of the study (20–26 months), children produced a considerable number of such references, with the majority initiated either fully or partially by the children themselves. Thus, children were not simply echoing references made by their interlocutors. These early references often expressed attachment-related concerns with respect to absent family members. For example, children expressed a desire for the absent family member, called out their name, or asked about their location. Over time, however, they talked about absent individuals, including family members, in a more neutral or reflective fashion, commenting on their characteristics and activities. The findings highlight the early emergence of references that are displaced in space from the utterance context.

show abstract

“…Finally, to examine whether a similar relationship between morphological typology and induction per- formance is observed in languages other than English and Korean, the NeuralChar and NeuralWord models were also evaluated on Mandarin Chinese and German child-directed speech corpora from CHILDES. The Chinese corpus consists of 19,541 caregiver utterances from the Tong section (Deng et al, 2018) with a mean sentence length of 5.7 words, which were recorded at ages from 1 year 0 months and 4 years 5 months. The German corpus contains 20,000 child-directed utterances randomly sampled from the Leo section (Behrens, 2006), as the original corpus contained many duplicate utterances in interactions between Leo and his caregivers between ages 1 year 11 months and 4 years 11 months.…”

Section: Replication Using Silver Datamentioning

confidence: 99%

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Jin

Oh²,

Schuler

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-ofthe-art results on many languages, further supporting a distributional model of syntactic acquisition.

show abstract

A Multimedia Corpus of Child Mandarin: The Tong Corpus

Cited by 29 publications

References 43 publications

Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition

Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition

Talking about people who are not there: Children’s early references to absent caregivers and absent friends

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Contact Info

Product

Resources

About