CHILDES Mandarin Tong Corpus

Xiangjun Deng
Bilingualism and Language Disorders Laboratory
Chinese University of Hong Kong

Virginia Yip
Dept. of Modern Languages & Intercultural Studies
Chinese University of Hong Kong

Participants: 1
Type of Study: longitudinal
Location: China
Media type: audio + video
DOI: doi:10.21415/T5PC7Q

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Publications using the Tong corpus should cite:

Deng, X. & Yip, V. (2018). A multimedia corpus of child Mandarin: The Tong corpus. Journal of Chinese Linguistics 46 (1): 69-92.

Deng, X. & Yip, V. (2015). A corpus study of the acquisition of ba and bei constructions in Mandarin. Paper presented at The International Symposium on Psycholinguistics of Second Language Acquisition and Bilingualism, Chinese University of Hong Kong.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Tong corpus is part of a larger study covering the naturalistic interactions between a Mandarin-speaking child Tong and his caregivers from age 1;0 to 4;5. Tong was raised in Shenzhen, China where Mandarin is the language of the community. Members of the family all speak Mandarin to the child. The child is also exposed to some input in Rudong dialect, a variety of Jianghuai Mandarin spoken in Jiangsu province, China, as his father and grandmother sometimes speak the dialect between them. From 2;5 on, the child also received some Cantonese and English input in his kindergarten. However, the influence of these languages was at a minimum level, and the child only spoke Mandarin at home. From 3;3, for three hours a day he attended a kindergarten in Hong Kong which used Cantonese as the medium of instruction.

Recording and transcription

The spontaneous play situation was audiotaped for one hour each week since 1;0, and video recording started at 2;3. In an effort to improve on published child Mandarin corpora, we constructed the audio- or video-linked longitudinal corpus with denser sampling, and documented naturalistic adult-to-child input over the entire period of study. In the initial phase, 22 one-hour recordings with one-month intervals from 1;7 to 3;4 are released in CHILDES with details shown in Table 1. They have been transcribed and checked by native speakers of Mandarin with linguistic training. The corpus provides a morphosyntactic tier which facilitates grammatical analysis of the data. Table 2 summarizes the major parts of speech or syntactic categories used in tagging the corpus.

Table 1. Information about the Tong corpus released in CHILDES
No.File nameAgeNo. of
No. of

Table 2: Major parts of speech used in Tong corpus
1Adjectiveadj小 xiao3 ‘small’
2Adverbadv老 lao3 ‘always’
3Aspect markerasp了 le ‘perfective’
4Classifierclass分钟 fen1zhong1 ‘minute’
5Communicatorco哎呀 ai1ya1 ‘jeez’
6Conjunctionconj不但 bu2dan4 ‘not only’
7Interjectionint对不起 dui4bu4qi3 ‘sorry’
8Nounn鱼 yu2 ‘fish’
9Negationneg不 bu4 ‘not’
10Numbernum八 ba1 'eight'
11Onomatopoeiaon轰隆 hong1long2 ‘rumble’
12Postpositionpost后面hou4mian4 ‘behind’
13Prepositionprep从 cong2 ‘from’
14Pronounpro我 wo3 ‘I’
15Quantifierquant各 ge4 ‘each’
16Sentence final particlesfp吗 ma ‘question’
17Small (functional) wordssmall的 de
18Verbv逛 guang4 ‘hang out’

For more details, please refer to Deng & Yip (in press), or our website


We would like to express our gratitude to Brian MacWhinney, Director of CHILDES for his expertise, advice and technical support in constructing the Tong corpus.

Our special thanks go to the transcribers of the Tong corpus: Zhong Jing, Lam Ho Ching, Xie Shanrong, Zhou Jiangling, Lu Yaqiao, Lyu Lu, Yao Yao, Au Chui Yee, and Zhishu Yu. We gratefully acknowledge the support of our lab members, especially Stephen Matthews and Mai Ziyin.

The research was supported by a start-up grant to set up the Bilingualism and Language Disorders Laboratory at the CUHK-Shenzhen Research Institute, CUHK funding for the CUHK-Peking University-University System of Taiwan Joint Research Centre for Language and Human Complexity, a General Research Fund from the Hong Kong Research Grants Council (Project no. 14413514), and the Stella and Leanne Lu Fund.