CHILDES Mandarin Tong Corpus

Xiangjun Deng
Bilingualism and Language Disorders Laboratory
Chinese University of Hong Kong
dengxj98@gmail.com

Virginia Yip
Dept. of Modern Languages & Intercultural Studies
Chinese University of Hong Kong
vyip@humanum.arts.cuhk.edu.hk

Participants:	1
Type of Study:	longitudinal
Location:	China
Media type:	audio + video
DOI:	doi:10.21415/T5PC7Q

Citation information

Publications using the Tong corpus should cite:

Deng, X. & Yip, V. (2018). A multimedia corpus of child Mandarin: The Tong corpus. Journal of Chinese Linguistics 46 (1): 69-92.

Deng, X. & Yip, V. (2015). A corpus study of the acquisition of ba and bei constructions in Mandarin. Paper presented at The International Symposium on Psycholinguistics of Second Language Acquisition and Bilingualism, Chinese University of Hong Kong.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Tong corpus is part of a larger study covering the naturalistic interactions between a Mandarin-speaking child Tong and his caregivers from age 1;0 to 4;5. Tong was raised in Shenzhen, China where Mandarin is the language of the community. Members of the family all speak Mandarin to the child. The child is also exposed to some input in Rudong dialect, a variety of Jianghuai Mandarin spoken in Jiangsu province, China, as his father and grandmother sometimes speak the dialect between them. From 2;5 on, the child also received some Cantonese and English input in his kindergarten. However, the influence of these languages was at a minimum level, and the child only spoke Mandarin at home. From 3;3, for three hours a day he attended a kindergarten in Hong Kong which used Cantonese as the medium of instruction.

Recording and transcription

The spontaneous play situation was audiotaped for one hour each week since 1;0, and video recording started at 2;3. In an effort to improve on published child Mandarin corpora, we constructed the audio- or video-linked longitudinal corpus with denser sampling, and documented naturalistic adult-to-child input over the entire period of study. In the initial phase, 22 one-hour recordings with one-month intervals from 1;7 to 3;4 are released in CHILDES with details shown in Table 1. They have been transcribed and checked by native speakers of Mandarin with linguistic training. The corpus provides a morphosyntactic tier which facilitates grammatical analysis of the data. Table 2 summarizes the major parts of speech or syntactic categories used in tagging the corpus.

Table 1. Information about the Tong corpus released in CHILDES

No. File name Age No. of
utterances No. of
words MLU
1 130202 1;7.18 101 168 1.66
2 130309 1;8.22 428 1020 2.38
3 130405 1;9.19 406 1089 2.68
4 130504 1;10.17 382 1054 2.76
5 130607 1;11.21 445 1265 2.84
6 130705 2;0.19 376 1119 2.98
7 130802 2;1.17 507 1500 2.96
8 130901 2;2.16 294 1094 3.72
9 130929 2;3.14 430 1586 3.69
10 131103 2;4.16 489 1935 3.96
11 131215 2;5.30 395 1222 3.09
12 131229 2;6.13 398 1580 3.97
13 140203 2;7.19 454 1458 3.21
14 140223 2;8.8 451 1613 3.58
15 140323 2;9.6 303 1032 3.41
16 140423 2;10.6 529 1998 3.78
17 140525 2;11.8 454 1724 3.80
18 140628 3;0.12 466 1948 4.18
19 140726 3;1.10 456 1767 3.88
20 140824 3;2.8 409 1091 2.67
21 140921 3;3.6 496 2306 4.65
22 141025 3;4.9 468 1701 3.64

Table 2: Major parts of speech used in Tong corpus

No. Category Code Example
1 Adjective adj 小 xiao3 ‘small’
2 Adverb adv 老 lao3 ‘always’
3 Aspect marker asp 了 le ‘perfective’
4 Classifier class 分钟 fen1zhong1 ‘minute’
5 Communicator co 哎呀 ai1ya1 ‘jeez’
6 Conjunction conj 不但 bu2dan4 ‘not only’
7 Interjection int 对不起 dui4bu4qi3 ‘sorry’
8 Noun n 鱼 yu2 ‘fish’
9 Negation neg 不 bu4 ‘not’
10 Number num 八 ba1 'eight'
11 Onomatopoeia on 轰隆 hong1long2 ‘rumble’
12 Postposition post 后面hou4mian4 ‘behind’
13 Preposition prep 从 cong2 ‘from’
14 Pronoun pro 我 wo3 ‘I’
15 Quantifier quant 各 ge4 ‘each’
16 Sentence final particle sfp 吗 ma ‘question’
17 Small (functional) words small 的 de
18 Verb v 逛 guang4 ‘hang out’

For more details, please refer to Deng & Yip (in press), or our website http://cbrchk.org/the-tong-corpus/.

Acknowledgements

We would like to express our gratitude to Brian MacWhinney, Director of CHILDES for his expertise, advice and technical support in constructing the Tong corpus.

Our special thanks go to the transcribers of the Tong corpus: Zhong Jing, Lam Ho Ching, Xie Shanrong, Zhou Jiangling, Lu Yaqiao, Lyu Lu, Yao Yao, Au Chui Yee, and Zhishu Yu. We gratefully acknowledge the support of our lab members, especially Stephen Matthews and Mai Ziyin.

The research was supported by a start-up grant to set up the Bilingualism and Language Disorders Laboratory at the CUHK-Shenzhen Research Institute, CUHK funding for the CUHK-Peking University-University System of Taiwan Joint Research Centre for Language and Human Complexity, a General Research Fund from the Hong Kong Research Grants Council (Project no. 14413514), and the Stella and Leanne Lu Fund.