CHILDES Mandarin-Cantonese-English Leo Corpus

Ziyin Mai
Department of Linguistics and Translation
City University of Hong Kong


Virginia Yip
Linguistics and Modern Languages
Chinese University of Hong Kong


Participants: 1
Type of Study: naturalistic
Location: Hong Kong
Media type: audio
DOI: doi:10.21415/T5V398

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Mai, Z.* & Yip, V. (2022) Caretaker input and trilingual development of Mandarin, Cantonese and English in early childhood (1;6-2;11). International Journal of Bilingual Education and Bilingualism.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Leo Corpus documents the simultaneous development of Mandarin, Cantonese and English in a Hong Kong child from 1;06-2;11. Leo was born in 2015 and raised in Hong Kong, where both Chinese and English are the official languages. His parents are native speakers of Mandarin (and Cantonese), and second language speakers of English. Before 1;01, the family adopted the “one parent-one language” practice, where the father and paternal grandmother addressed the child in Mandarin, and the mother addressed him in Cantonese consistently. From 1;01 to 3;04, the family introduced an innovative “one day-one language” system on top of the “one parent-one language” system, in which the father and the grandmother continued to speak Mandarin to the child, whereas the mother began to interact with the child in English every other day in a week and Cantonese the rest of the days. Leo also had exposure to the three languages through attending playgroups and nursery schools. The parents and research assistants began to video-tape their interactions with Leo on a weekly basis at 0;06. The current corpus contains monthly audio recordings and corresponding transcripts in three languages for 18 months from 1;06 to 2;11 (54 files, 27 hours in total), featuring Leo interacting with his main providers of input in the three languages: Mandarin from father and grandmother, Cantonese from mother, and English from mother, domestic helper and school teachers who are native speakers of English (represented by an American research assistant in the recordings). Table 1 lists the files in the Leo Corpus and the main adult interlocutor(s) in each file. File names give the child's age. For more information about Leo, his family background and language exposure, please refer to Mai and Yip (under review). For access to the .wav files and video recordings, please contact the authors for more details.


We would like to express our gratitude to Brian MacWhinney, Director of CHILDES for his expertise, advice and technical support in constructing the Leo Corpus.

Our special thanks go to the research assistants and student helpers who participated in the project in one or more ways: recording the child’s speech data, transcribing the data, tagging and checking the transcripts: Sophia Zishu Yu, Riki Yuqi Wu, Hannah Lam, Joy Jieyu Zhou, Bill Wu, especially Vaness Tsz Yan Law. We gratefully acknowledge the support and help of our lab members: Stephen Matthews, Xiangjun Deng, Gloria Yanhui Zhang, Elaine Lau, Emily Haoyan Ge and Jiangling Zhou.

The research was supported by a CUHK Direct Grant awarded to Ziyin Mai (“Trilingualism in Early Childhood: Production and Comprehension”), and a start-up grant to set up the University of Cambridge-Chinese University of Hong Kong Joint Laboratory for Bilingualism at CUHK, co-directed by Virginia Yip and Boping Yuan.