CHILDES Mandarin Beijing Corpus

Twila Tardif
Department of Psychology
University of Michigan

Participants: 10
Type of Study: longitudinal
Location: China
Media type: audio
DOI: doi:10.21415/T5MK5D

Citation information

Publications using these data should cite:

Tardif, T. (1993). Adult-to-child speech and language acquisition in Mandarin Chinese. Unpublished doctoral dissertaion, Yale University.

Tardif, T. (1996). Nouns are not always learned before verbs: Evidence from mandarin speakers’ early vocabularies. Developmental Psychology, 32, 492–504.

Additional references include:

Chao, Y. R. (1968). A grammar of spoken Chinese. Berkeley, CA: University of California Press.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

These data were recorded in Beijing between August 1991 and January 1992 and were analyzed in Tardif (1993). The data were collected from 10 families and their toddlers who were selected from immunization records. The criteria were that: (1) the children should be between 20 and 22 months of age at the beginning of the study; (2) their parents should be native speakers of Mandarin and, preferably, native to the city of Beijing; and (3) both parents should have received formal schooling which was either high school level or below for the “workers” group and college level or above for the “intellectuals” group. All children were firstborn and only children. This was a necessary consequence of China’s one–child policy and not explicitly a feature of the design. An effort was also made to equate the age and gender distribution of the participants in each of the social class groups. Thus, each group had a total of five children with four males and one female, and the average age of the two groups differed by only 3 days. Overall, the children’s mean age was 21 months, 24 days at the time of the first visit.

Ages at visit 1 and visit 5
ChildGenderVisit 1Visit 5Social Class

The caregivers were not always the children’s parents, nor were they always the same from one visit to the next. In general, caregiving in China is unlike what one might find for many Anglo-European families in the United States. Rather than having a single caregiver who stays at home with the child during the day, Chinese children are exposed to multiple caregivers who each play significant and overlapping roles in the child’s daily life. Thus, caregivers may include not only the children’s mothers and fathers, but also grandparents or great-grandparents, live-in nannies, aunts who came to the house everyday for lunch or dinner, neighbors, or any adult who felt it necessary to intervene in the child’s activities. The definition of a “caregiver” used in this project was anyone of school-age or above who addressed at least five utterances to the target child in a single visit or who performed caregiving activities such as feeding, dressing, bathing, and playing with the child on a regular basis. In further analyses of adult–child speech with this corpus, I would suggest pooling across all active caregivers in each visit.

All visits were conducted by Twila Tardif (Chinese name: XiaLing), a nonnative but fluent speaker of Mandarin, who was accompanied on at least two of the early visits to each family by a native Beijing research assistant who helped explain the purposes of the study and to ensure that everything would run smoothly. The families were told that the researcher was interested in children’s language development and wanted to collect data that were as naturalistic as possible by recording the children in interaction with whomever they normally interacted. They were not told until the very end of the study that the study also looked at the effects of adult speech on children’s language learning.

Each visit was scheduled at the convenience of the child’s family with the only condi-tion that the visits were to be spaced about 2 weeks apart and that the family was asked to do whatever they normally did at that time of day. The actual activities that the families participated in varied dramatically, but included the usual range of activities that we would consider “normal” for a two-year-old child and his or her caregiver(s): indoor toy play, watching television, cleaning up, feeding, talking and playing with neighbors, and even a trip to a local amusement park. In all cases, the researcher asked the families not to interact with her during the recording time and to try to ignore her presence as she stayed off to the side taking notes on the context of the interactions. In practice, interactions between the researcher and the family frequently occurred, particularly towards the end of the study when she was a familiar presence to not only the children and their families, but also to their immediate neighbors.

Visit 1, or the 22-month visit, was the second or third visit made to each of the families whereas Visit 5/6, or the 26-month visit, was the eleventh or twelfth recording session. Families were paid a total of 200 yuan (at the time, the exchange rate was 6 yuan to 1 United States dollar), approximately one month’s salary, at the completion of their participation in the 6-month longitudinal study.

The tapes from each visit were first transcribed into the pinyin system of romanized Chinese spelling by trained undergraduate and graduate students (all native speakers of Mandarin, most also Beijing natives) from one of three Beijing universities. The transcribers were asked to not only write down the words that they heard but to also pay close attention to who the speaker and intended listener for an utterance was, as well as utterance boundaries, changes in loudness, and any errors, mispronunciations, or dialect words that occurred. After initial transcription, the tapes were then listened to by the researcher and entered into the computer for analysis. Any disagreements between the researcher and the student transcribers were resolved by playing the segment to at least one other native Mandarin speaker and entering the form that was agreed upon by at least two of the listeners. If no agreement could be reached, the segment was deemed uninterpretable.

Words were coded according to the parts of speech described by Chao (1968). Speaker code IDs have the format *SP-LI, where the first two letters refer to the speaker of an utterance and LI refers to the intended listener.

The main findings that have been gathered from these transcripts thus far focus on these Mandarin-speaking children’s early vocabularies. Specifically, the children in this sample do not demonstrate a noun bias in their productive speech, but instead show more verb types and tokens than noun types and tokens (Tardif, 1993; Tardif, 1996). Examination of the caregivers’ speech has shown that Mandarin-speaking caregivers also use more verb types and tokens in their ongoing speech (Tardif, 1996) and that they use a much higher proportion of verb types and tokens in their speech than do Italian- or English-speaking caregivers. Moreover, verbs tend to appear in the highly salient utterance-initial and utterance-final positions in Mandarin adult-to-child speech that is also different from the pattern shown for English and other languages. Ongoing analyses of these data include an examination of adult use of pronouns and other address terms when speaking with their children and an examination of the discourse of negotiation in parent–child conflicts. In addition, Anat Ninio and her colleagues at Hebrew University are coding the children’s utterances in order to examine issues in the theory of dependency grammar.