CHILDES Cantonese Lee/Wong/Leung Corpus

Thomas Lee
Department of English
Chinese University of Hong Kong

Colleen Wong
Department of English
Hong Kong Polytechnic

Sam Leung
School of General Education and Languages
Technological and Higher Education Institute of Hong Kong

Participants: 8
Type of Study: longitudinal
Location: China
Media type: not available
DOI: doi:10.21415/T57W2Z

Browsable transcripts

Download transcripts

Citation information

Lee, T. H.T., Wong, C. H., Leung, S., Man. P., Cheung, A., Szeto, K., and Wong, C. S. P. The Development of Grammatical Competence in Cantonese-speaking Children, Report of RGC earmarked grant 1991-94.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus was collected by Thomas Hun-tak Lee (Chinese University of Hong Kong), Colleen H. Wong (Hong Kong Polytechnic University), and Samuel Leung. This database contains longitudinal data on the language of eight Cantonese-speaking children, each recorded for approximately 1 year.

These children were observed in their interactions with the caretakers, the investigator, and occasionally other adults who chatted with the children during the visits. Three research students carried out the observations and the recording. Patricia Man recorded Bohuen and Gakie; Alice Cheung recorded Bernard, Tsuntsun and Tinfaan; and Kitty Szeto recorded Johnny, Jenny and Chunyat. The names of the children and the ages during which they were recorded are as follows:
Bohuen (wbh)F2;03;23 – 3;04;0827
Gakei (cgk)F1;11;01 – 2;09;0919
Bernard (mhz)M1;07;00 – 2;08;0626
Tsuntsun (ckt)M1;05;22 – 2;07;2225
Tinfaan (ltf)F2;02;10 – 3;02;1816
Johnny (hhc)M2;04;08 – 3;04;1416
Jenny (lly)F2;08;10 – 3;08;0920
Chunyat (ccc)M1;10;08 – 2;10;2722
Each file name is made up of the initials of the child (the first three characters) and his or her age at the time of recording, in terms of year (1 character), month (2 characters), and day (2 characters). For instance, the file wbh20322.cha contains tagged utterances of Bohuen (whb) when she was 2 years, 3 months, and 22 days old.

The Children


Below is a summary list of the syntactic categories used in coding the corpus. The romanizations are based on the Cantonese romanization scheme of the Linguistic Society of Hong Kong (LSHK) (Matthews & Yip,1994, pp. 400-401).
1. adj = adjectivehung4
2. adv = focus adverbzung6, dou1, jau6, zoi3
3. advi = adverb of intensityhou3, gei2, gam3, zan1
4. advm = adverb of mannermaan6maan6dei2, ma4ma4dei2
5. advs = sentential adverbbat1jyu4, gam2(joeng2), jat1cai4
6. asp = aspectual markerzo2, zyu6, gan2, gwo3, hoi1
7. aux = auxiliary / modal verbjing1goi1, hang2, ho2ji5, wui, sai2
8. cl = classifiergo3, zek3, bun2, bui1, di1
9. com = comparative morphemegwo3 (as in dai6 gwo3), di1 (as in hung4 di1)
10. conj = connectivedan6hai6, tung4maai4, waak6ze2
11. corr = correlativejut6...jut6, jau6...jau6, gam2...gam3, jat1...jat1
12. ctc = cliticdak1, dou3
13. det = determinernei1, go2, dai6
14. dir = directional verblok6, soeng5, ceot1, jap6, lai4
15. ex = expressive utterancebaai1baai3, zou2san4
16. gen = genitive markerge3
17. ins=emphatic inserted markergwai2 (as in hou3 gwai2 leng3)
18. nn = nounping4gwo2, ba4ba1
19. nnloc = locative noun phrasesoeng6mien6, leoi3mien6
20. nnpr = pronounngo5, nei3, keoi3
21. nnpp = proper nametin1faan4, zeon3zeon3
22. neg = negative morphemem4, mai6, mou5
23. prt = postverbal particlefaan1, sai3, can1, maai4, gwo3, ha2
24. prep = prepositiontung4maai2, hai2, bei2
25. q = quantifierjat1, saam1, sap6, gei2, mui5
26. rfl = reflexive pronoun zi6gei2
27. sfp = sentence final particle&la3, &ga1 &ma3, &ge3 &le1


The creation of this corpus was made possible by a three-year grant (RGC earmarked grant CUHK 2/91) to Thomas Hun-tak Lee of the Chinese University of Hong Kong, Colleen H, Wong of the Hong Kong Polytechnic University, and Samuel Leung of the University of Hong Kong. The project was supported by two studentships from the Hong Kong Polytechnic awarded to Patricia Man and Alice Cheung, and a studentship from the University of Hong Kong awarded to Kitty Szeto. In addition, funding for the later stages of the project was provided by a direct grant from Faculty of Arts, Chinese University of Hong Kong, a grant from the Freemason’s Fund for East Asian Studies, as well as research assistantships from the Hong Kong Polytechnic University. The support of these funding agencies is hereby acknowledged. Further details are given in the report in "Citation information" above.