CHILDES English New England Corpus

Barbara Alexander Pan (1951-2011)
Graduate School of Education
Harvard University

Catherine Snow
Graduate School of Education
Harvard University


Participants: 52
Type of Study: naturalistic
Location: USA
Media type: audio
DOI: doi:10.21415/T52P6V

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Ninio, A., Snow, C., Pan, B., & Rollins, P. (1994). Classifying communicative acts in children’s interactions. Journal of Communications Disorders, 27, 157-188.

Additional relevant references are:

Dale, P., Bates, E., Reznick, S., & Morisset, C. (1989). The validity of a parent report in-strument. Journal of Child Language, 16, 239–249.

Ninio, A., & Wheeler, P. (1984). A manual for classifying verbal communicative functions in mother-infant interaction. Working Papers in Developmental Psychology, No. 1. Jerusalem: The Martin and Vivian Levin Center, Hebrew University.

Snow, C. E. (1989). Imitativeness: a trait or a skill? In G. Speidel & K. Nelson (Eds.), The many faces of imitation. New York: Reidel.

Snow, C., Pan, B., Imbens-Bailey, A., & Herman, J. (1996). Learning how to say what one means: A longitudinal study of children’s speech act use. Social Development, 5, 56– 84.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description


This directory contains longitudinal data on 52 children whose language development was studied by Catherine Snow, Barbara Pan, and colleagues as part of the project “Foundations for Language Assessment in Spontaneous Speech,” funded by the National Institutes of Health. Participants were chosen from a larger sample of 100 children on whom language and other data were available from the MacArthur Individual Differences Project. A description of participant solicitation and other information about the original sample can be found in Snow (1989) and Dale, Bates, Reznick, and Morisset (1989). The present sam-ple of 52 children from English-speaking families was chosen to include half girls and half boys, and equal proportions of children from families of lower-middle and upper-middle socioeconomic status. Children with indications of medical or other developmental problems were excluded.


Each child–parent (mother–child) dyad was brought to the laboratory at three ages: at 14 months, at 20 months, and again between the ages of 27 and 32 months. Transcripts at 14 and 20 months reflect spontaneous language data collected during a 5-minute warm up and several subsequent activities, each of which is described briefly here.
  1. Warm-up. For the warm-up period, the mother and child were left alone in a small room with some toys, and the mother was instructed to take a few minutes to let her child become accustomed to the setting.
  2. Toy play. Next there was a 5-minute period during which the child was given a variety of small toys to play with (Small-Scale Activity) while the mother was filling out a form at a nearby table. Because the mother was instructed not to initiate inter-action with the child during this period, this portion of the videotaped protocol was not transcribed.
  3. Forbidden object. In the next task, the mother was seated beside the child at the table and instructed to try and keep the child from touching an attractive, moving object (Forbidden Object). Users of these transcripts should be aware that this part of the transcribed data involved some triadic (examiner–parent–child) interaction, and thus for certain analyses may not be comparable to the dyadic (parent–child) interaction that makes up the rest of the transcript.
  4. Boxes. Finally, the mother was asked to spend about 10 minutes playing with her child using the contents of four successive boxes. She was not instructed how long should be spent on each box, but was told to try to get to all four, and to have only one box open at a time. The boxes contained, in order, a ball, a cloth for peekaboo, paper and crayons, and a book. The entire transcribed parent–child interaction av-eraged 20 to 25 minutes in duration.

The protocol for parent–child interaction at the third data point (age 27-32 months) involved only four boxes (no warm-up or forbidden object), and two substitutions were made to make the activities more age-appropriate: hand puppets and a Fisher-Price™ toy house replaced the ball and peekaboo cloth. Parent and child were videotaped by means of a camera located either at ceiling level in one corner of the room and operated by remote control, or located on the other side of a one-way mirror.

Gems are marked in this way. For 20 and 32 months, the first gem is Mother Freeplay. After that come from one to four 10 minutes. Finally, there is Book For 32 months, the order is Mother Freeplay , Book , and then 10 minutes .

Transcription and Coding

The transcripts in this corpus were prepared from the videotaped parent–child interac-tion by transcribers trained in the CHAT conventions. Users should note several specific transcription guidelines that were followed. Utterance boundaries were based on intonation contour. No attempt was made to distinguish the number of unintelligible words in a string; therefore xxx and yyy (rather than xx and yy) are used throughout. Where the phonological form could be represented, yyy was followed by a %pho tier and UNIBET transcription. Other nonverbal vocalizations were represented as 0 [=! vocalizes]. The audio quality of videotapes did not permit phonetic transcription. In general, no attempt was made to repre-sent possible word omissions, nor to distinguish child-invented forms, family-specific forms, and phonologically consistent forms; rather the generic @ was used for all three. Pauses were transcribed as either # or #long, rather than in terms of precise duration. Words on the main tier were morphemicized so that MLU could be automatically computed in morphemes, and so that inflected forms of nouns and verbs would be counted not as separate word types, but as tokens of the uninflected stem.

Because it was anticipated that looking behaviors, especially in the 14 month olds, would often be used to direct the adult’s attention and would therefore be important to con-sider in coding infants’ nonverbal communicative acts, it was decided that all looking be-haviors (as well as points, head nods, and so forth.) would be recorded on %gpx tiers. Time at the beginning of each activity and the passage of each subsequent full minute were recorded on %tim tiers.

Codes on the %spa tier are based on the Inventory of Communicative Acts Abridged (INCA-A), a shortened and modified version of the system developed by Ninio and Wheeler (1984). For fuller discussions of this coding scheme, see Ninio, Snow, Pan, & Rollins, (1994) and Snow, Pan, Imbens-Bailey, & Herman (1996).


Funding for this project was provided by the National Institutes of Health.