Xiangjun Deng Bilingualism and Language Disorders Laboratory Chinese University of Hong Kong dengxj98@gmail.com |
Virginia Yip Dept. of Modern Languages & Intercultural Studies Chinese University of Hong Kong vyip@humanum.arts.cuhk.edu.hk |
Participants: | 1 |
Type of Study: | longitudinal |
Location: | China |
Media type: | audio + video |
DOI: | doi:10.21415/T5PC7Q |
Deng, X. & Yip, V. (2018). A multimedia corpus of child Mandarin: The Tong corpus. Journal of Chinese Linguistics 46 (1): 69-92.
Deng, X. & Yip, V. (2015). A corpus study of the acquisition of ba and bei constructions in Mandarin. Paper presented at The International Symposium on Psycholinguistics of Second Language Acquisition and Bilingualism, Chinese University of Hong Kong.
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Tong corpus is part of a larger study covering the naturalistic interactions between a Mandarin-speaking child Tong and his caregivers from age 1;0 to 4;5. Tong was raised in Shenzhen, China where Mandarin is the language of the community. Members of the family all speak Mandarin to the child. The child is also exposed to some input in Rudong dialect, a variety of Jianghuai Mandarin spoken in Jiangsu province, China, as his father and grandmother sometimes speak the dialect between them. From 2;5 on, the child also received some Cantonese and English input in his kindergarten. However, the influence of these languages was at a minimum level, and the child only spoke Mandarin at home. From 3;3, for three hours a day he attended a kindergarten in Hong Kong which used Cantonese as the medium of instruction.
Table 1. Information about the Tong corpus released in CHILDES
No. | File name | Age | No.
of utterances | No. of words | MLU
1 | 130202 | 1;7.18 | 101 | 168 | 1.66
| 2 | 130309 | 1;8.22 | 428 | 1020 | 2.38
| 3 | 130405 | 1;9.19 | 406 | 1089 | 2.68
| 4 | 130504 | 1;10.17 | 382 | 1054 | 2.76
| 5 | 130607 | 1;11.21 | 445 | 1265 | 2.84
| 6 | 130705 | 2;0.19 | 376 | 1119 | 2.98
| 7 | 130802 | 2;1.17 | 507 | 1500 | 2.96
| 8 | 130901 | 2;2.16 | 294 | 1094 | 3.72
| 9 | 130929 | 2;3.14 | 430 | 1586 | 3.69
| 10 | 131103 | 2;4.16 | 489 | 1935 | 3.96
| 11 | 131215 | 2;5.30 | 395 | 1222 | 3.09
| 12 | 131229 | 2;6.13 | 398 | 1580 | 3.97
| 13 | 140203 | 2;7.19 | 454 | 1458 | 3.21
| 14 | 140223 | 2;8.8 | 451 | 1613 | 3.58
| 15 | 140323 | 2;9.6 | 303 | 1032 | 3.41
| 16 | 140423 | 2;10.6 | 529 | 1998 | 3.78
| 17 | 140525 | 2;11.8 | 454 | 1724 | 3.80
| 18 | 140628 | 3;0.12 | 466 | 1948 | 4.18
| 19 | 140726 | 3;1.10 | 456 | 1767 | 3.88
| 20 | 140824 | 3;2.8 | 409 | 1091 | 2.67
| 21 | 140921 | 3;3.6 | 496 | 2306 | 4.65
| 22 | 141025 | 3;4.9 | 468 | 1701 | 3.64 | |
Table 2: Major parts of speech used in Tong corpus
No. | Category | Code | Example
1 | Adjective | adj | 小 xiao3 ‘small’
| 2 | Adverb | adv | 老 lao3 ‘always’ | 3 | Aspect
marker | asp | 了 le ‘perfective’
| 4 | Classifier | class | 分钟 fen1zhong1 ‘minute’
| 5 | Communicator | co | 哎呀 ai1ya1 ‘jeez’
| 6 | Conjunction | conj | 不但 bu2dan4 ‘not only’
| 7 | Interjection | int | 对不起 dui4bu4qi3 ‘sorry’
| 8 | Noun | n | 鱼 yu2 ‘fish’ | 9 | Negation | neg | 不
bu4 ‘not’ | 10 | Number | num | 八 ba1 'eight'
| 11 | Onomatopoeia | on | 轰隆 hong1long2 ‘rumble’
| 12 | Postposition | post | 后面hou4mian4 ‘behind’
| 13 | Preposition | prep | 从 cong2 ‘from’
| 14 | Pronoun | pro | 我 wo3 ‘I’
| 15 | Quantifier | quant | 各 ge4 ‘each’ | 16 | Sentence
final particle | sfp | 吗 ma ‘question’ | 17 | Small
(functional) words | small | 的 de | 18 | Verb | v | 逛 guang4
‘hang out’ | |
For more details, please refer to Deng & Yip (in press), or our website http://cbrchk.org/the-tong-corpus/.
We would like to express our gratitude to Brian MacWhinney, Director of CHILDES for his expertise, advice and technical support in constructing the Tong corpus.
Our special thanks go to the transcribers of the Tong corpus: Zhong Jing, Lam Ho Ching, Xie Shanrong, Zhou Jiangling, Lu Yaqiao, Lyu Lu, Yao Yao, Au Chui Yee, and Zhishu Yu. We gratefully acknowledge the support of our lab members, especially Stephen Matthews and Mai Ziyin.
The research was supported by a start-up grant to set up the Bilingualism and Language Disorders Laboratory at the CUHK-Shenzhen Research Institute, CUHK funding for the CUHK-Peking University-University System of Taiwan Joint Research Centre for Language and Human Complexity, a General Research Fund from the Hong Kong Research Grants Council (Project no. 14413514), and the Stella and Leanne Lu Fund.