CHILDES Cantonese HKU-70 Corpus

Paul Fletcher
Speech and Hearing
University College Cork
p.fletcher@ucc.ie
website

Stephanie Stokes
Division of Speech and Hearing Sciences
University of Hong Kong
sstokes@hku.hk

Zehava Weizman

website

Participants:	70
Type of Study:	interview
Location:	China
Media type:	audio
DOI:	doi:10.21415/T5259T

Citation information

Research using these data should cite these sources:

Fletcher, P. Lee, T. H-T, Leung, S., and Stokes, S. Milestones in the learning of spoken Cantonese by pre-school children. Language Fund, Hong Kong. (1996-1999).
Fletcher, P., Leung, S. C-S., Stokes, S. F., & Weizman, Z. O. (2000). Cantonese pre-school language development: A guide. Hong Kong: Department of Speech and Hearing Sciences.
Weizman, Z. O. and Fletcher, P. A comparative study of language development: English and Cantonese pre-schoolers in Hong Kong. Committee on Research and Conference Grants, University of Hong Kong. (2000).

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus is a set of 70 files of interviews with 70 children ages 2;6 to 5;6. These data were collected and processed at Hong Kong University by Zehava Weizman, Paul Fletcher, and Emily Ma. This corpus consists of 70 transcripts of audio-recordings from a cross-sectional study of 70 Cantonese-speaking children. This naturalistic spoken language data involve 10 children, five boys and five girls, at each 6 months interval between 2.5 years and 5.5 years of age to cover the whole preschool range. The children were recruited from a Cantonese-speaking pre-school in Hong Kong. Although socioeconomic status was not taken into account with respect to recruitment, the children were predominantly middle-class.Each child in the sample was prescreened, pretested using the Reynell Developmental Language Scales (RDLS Cantonese Version), and audio taped in conversation for a total time of about one hour. The adult-child language sampling was carried out in the child’s preschool. A warm-up task was conducted at the beginning of the session to insure that the child was comfortable with the investigator and the task. The language sample aimed to elicit a minimum of 100 utterances, usually achieved within 20 minutes. To ensure sufficient opportunities for verbalization and diversity of syntactic and lexical forms, but also to achieve as much comparability as possible across children, the conversation was organized around familiar bath/dress/feed/sleep routines. The children’s dates of birth and ages are available in the headers to each transcript. Digitized audio is also available, although it is not yet linked to the files.