CHILDES English Hall Corpus

William Hall (1935-2011)
Department of Psychology
University of Maryland

Participants: 39
Type of Study: naturalistic
Location: USA
Media type: audio, unlinked
DOI: doi:10.21415/T5ZK5H

Browsable transcripts

Download transcripts

Link to media folder

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This directory contains a large database of conversational interactions from 39 children aged 4;6 to 5;0. The files were computerized by Bill Hall of the University of Maryland and donated to the CHILDES in 1984. They were first placed into CHAT format in 1987, but this work was redone from the originals in 1991 to clear up additional problems. Although the conversion to CHAT was generally straightforward, it was not possible to code overlaps by pairs. Instead, the overlap marked was coded by using the [%^] sign for without any attempt to indicate the matches between overlap markers.

The corpus was collected with the purpose of providing a solid basis for comparing vo-cabulary usage in different socioeconomic and ethnic groups. This section describes how the corpus was collected in a way to ensure that spontaneous speech would be recorded in a variety of natural situations. These situations are characterized in as much detail as possible, to provide users of these data with an accurate picture of the conditions under which they were collected.

Participants were 39 preschool children (4;6 to 5;0) divided approximately equally ac-cording to race and socioeconomic status (SES) as follows: middle-class Black, middle-class White, working-class Black, and working-class White. The working-class children in our sample were attending federally-funded preschools: the middle-class children were in private preschools. The working-class Black children were in all-Black classes, with Black teachers, whereas the middle-class Black children were in interracial classes with both Black and White teachers. None of the Black target children were in the same classes as any of the White target children in our sample.

Language samples were collected over 2 consecutive days for each child. On each day, an average of about 150 minutes of conversation were recorded, distributed among different situations. Most importantly, the situations in which the data were gathered were both natural and varied. Conversations were taped in a variety of situations at home, at school, and en route between the two. The children and their families were aware that they were being taped, but this seems to have caused little if any disruptions of normal activities. There are occasional references (although relatively few) to the fact that the tape recorder is on; but the conversations are natural. Reading the transcripts, one can clearly sense that the families are not “putting on an act” for the tape recorder; they tend to ignore it almost completely. Even if the presence of the tape recorder does exert some effect, the fact remains that there is no other method, apart from deception, that would offer a less obtrusive way of obtaining natural data.

The taping equipment was also chosen to minimize any disruption of normal conversa-tion or activities. The children wore vests with wireless microphones sewn in; their movements were not restricted, and they seemed quickly to forget about having them on. The use of wireless microphones made possible the inclusion of speech by the target children that might not otherwise have been recorded, including monologues spoken while the child was out of the hearing of any visible listener. Field workers clipped microphones to their ties. Although other adults and nontarget children in the study did not wear microphones, the two microphones used were, in general, sensitive enough to pick up significant verbal interaction with the children in the study. Portable tape recorders enabled data collection in a number of different settings, for example, in homes, shops, moving cars, and on sidewalks. The mobility achieved in this way would not have been possible with videotapes; although videotapes would provide more complete data in some respects, their use would have been far more disruptive.

The effects of the experimenter’s race were minimized by using a Black field worker with Black families and a White field worker with White families. In the collection of data, the field workers tried to be as unobtrusive as possible. They rarely initiated conversations, but if spoken to, attempted to respond naturally. One of the field worker’s responsibilities was to provide a verbal description of the context. For the purposes of this research, the context included: where the recording took place, where the participant was, who the interactants were and what they were doing. Descriptions of context often included what hap-pened prior and subsequent to, as well as simultaneous with, the verbal interaction.

In order to sample situational variations in language, each child was recorded in a series of 10 temporal situations, which can be grouped into three basic categories: Home, School, and Transition. The Home data consists of tapes made in the following situations: prior to school in the morning, arriving home from school, before dinner, during dinner, and before bed. Each of these took place in or near the child’s home, and includes approximately 30 minutes of conversation (15 minutes on each of the days taping was done). Particular segments of activities are missing from particular files. In BOO, the dinner segment was taped outside on the street. In TOS there is no dinner segment. In JAF and ANC there are no di-rected-activity segments. The target children were between 4;6 and 5;0 during the taping. In each of the four groups, there were more male than female target children. The makeup of the families differs somewhat from one group to another. The social class and ethnic status of the children is as follows. Asterisks mark children for whom we have audio data, but no transcript.

Hall Children

Class/StatusTarget Children

Participants GRC, LEA, and GAS in the White Professional group have taperecordings, but no transcripts yet. A speaker is considered present if there are more than 100 words spoken at home by speaker or speakers in that category. For example, there was a brother of the target child present in 5 of the 11 Black middle-class families in our sample, that is, in 45% of these families, but only in 3 (that is 33%) of the White working-class families. In all the families in our sample, both the experimenter and the target child spoke more than 100 words at home.

Other Hall Participants

SpeakerBlack Middle Class (N=11)White Middle Class (N=9)Black Working Class (N=10)White Working Class (N=9)
Brother5 (45)4 (44)6 (60)3 (33)
Sister4 (36)0 (0)6 (60)1 (11)
Male Child3 (27)1 (11)4 (40)5 (56)
Female Child3 (27)0 (0)3 (30)2 (22)
Mother10 (91)9 (100)10 (100)8 (89)
Father8 (73)6 (67)3 (30)4 (44)
Grandmother0 (0)1 (11)4 (40)1 (11)
Grandfather0 (0)1 (11)0 (0)0 (0)
Male Adult1 (9)1 (11)4 (40)3 (33)
Female Adult2 (22)3 (30)5 (56)

The make-up of the conversations at home was also investigated by doing race x SES ANOVAs on the percentage of the total conversation spoken by each category of speaker.

There were no main effects or interactions of race or SES in the percentage of the conversation spoken by the target children. Thus, in all the groups in the study, the target children had a roughly equivalent share of the conversation. An analysis on the absolute number of turns spoken by the target children also showed no main effects of race or SES.

The experimenter took a significantly larger proportion of turns with black families, and with lower SES families. Conversely, black mother, and working-class mothers, took a smaller proportion of turns. These two facts are presumably related; the fact that the experimenter takes a larger proportion of turns in a given group means that some other speakers must necessarily get a smaller share. It is interesting to note that increased talk by the experimenter encroaches not on the speech of the target children, but on that of the parents. The reason for the experimenter taking more turns in the tapes of the black families is clear: The black families almost without exception invited the experimenter to eat dinner with them. The experimenter working with the white families, on the other hand, was only occasionally invited for dinner, and was able to consistently decline. It is not clear, however, why the experimenter would take a larger proportion of turns with the working-class families than with the middle-class families, especially since each of the two experimenters worked both with the working-class and middle-class families. The only other significant difference is that there was a higher percentage of unrelated children present in the working-class families than in the middle-class families. (The effect is significant for all unrelated children, and for male unrelated children, but not for female unrelated children. The school data includes conversations taped during the following situations: Arriving at school, snack time, free play, and teacher-directed activity. “Arriving at school” picks up where the Transition situation leaves off, that is, at the point where the target child enters the preschool classroom.There are some noticeable differences between the four groups in terms of what went on during the teacher-directed activity segment (see the following table).

Activities in this segment can be divided into three types on the basis of the roles played by teacher and children. In Adult-Directed activities, it is largely the teacher who determines what is done and how it is done. Discussion activities allow for more input from the children, and in the Creative activities, there is a minimum of direction from the teacher; the teacher may specify what activity is to be done, but it is up to the children how to carry it out. Adult-directed activities can be further subdivided according to content. Academic activities are those that relate to the normal sphere of classroom learning, including any task focusing on letters, words, or numbers. Basic activities are those oriented toward information that could be considered more practical than academic, for example, learning the days of the week or the children’s home addresses. The Other Directed activities are more var-ied. In content they are most similar to the Creative activities, but are different from these in that they involve a greater degree of directiveness on the part of the teacher. The following table gives the percentage of on-task time spent in each group on each of these categories of activity. Time is measured in terms of the percent of each group’s total on-task turns that relate to a given type of activity.

The groups differed only slightly in terms of how much time in the teacher-directed activity segment was actually spent on task. The table above indicates the way and extent to which classroom conversation was related to the classroom activities. The numbers in the table represent the percent of each group’s turns of conversation in each of several categories: On task turns are those that refer directly to the task that the child is engaged in. Procedural turns are those related to the process of preparing for or ending a task, for example, obtaining materials and setting up, cleaning up afterwards, or putting on coasts before going outside. Tangential turns are those referring indirectly to the task at hand. For example, in the case of children making cookies, discussion of the children’s mothers making cookies would be counted as tangential. Extraneous turns were those used in play, arguments, or conversions not related to the task, and not in any of the other categories. Discipline turns were those used by the teacher in disciplining the children. Tape turns were those referring to the taping procedure, or part of conversation with the experimenter.

The totals for some groups in the table above do not add up quite to 100%, because in a few cases, a few turns were spoken by other teachers or adults not part of the target child’s class. Such turns never amounted to more than 1.5% of the total turns.

In the classrooms of the white middle-class and white working-class children, almost all the talk related to the situation at hand. Among the black working-class children, on the other hand, there is more reference to the past, and especially to past events involving the home or family. There are also references to the future, not found in the other three groups.  

In general, the teachers of the black middle-class children emphasized reading-both the teacher reading stories to the children, and the children reading with the teacher’s assistance. About half of the activities in this group were done as a whole class.

The white working-class preschool activities were more loosely organized, with the children largely choosing for themselves what they would do. Activities tended to be creative and art-oriented, such as play with modelling clay, and were done either individually, or in small groups. The black working-class children spent much of their time in activities involving the whole class as a group. There was discussion of children’s past experience, for example, what they had done over the weekend or trips they had taken, and some work on basic skills, for example, the alphabet, counting activities, including taking attendance, learning the days of the week, and learning their home addresses.

The white working-class children spent their time in small group activities (1-5 children), with some emphasis on basic skills-simple math, learning colors and names of animals, and some reading. The teachers of the black target children spent more time dealing with the children as a group. Conversely, the black target children received less individual attention than did the white target children.

Only the black children, and especially the black working-class children, spent any appreciable time talking about anything outside of the classroom context. This primarily involved talk about the children’s homes, family life, and recent experiences. The conversation of the white target children and their teachers was focused almost exclusively on the present. Race x SES ANOVAS were preformed on both the number of turns of conversation taken by the target children, and on the percentage of turns taken by the target children. In both cases, there were neither main effects of race or SES, nor any significant interactions. Thus, the target children from the four groups spoke essentially the same amount, both in absolute terms and in terms of the proportion of the total conversation that their speech constituted.

The transition situation represents that part of the corpus that was taped while the target child was on the way to school. It starts with the point at which the target child goes out the door of the house or apartment, and ends when he or she enters the preschool classroom. A typical transition situation goes as follows: The target child and a parent, accompanied by the experimenter, leave the apartment and take either the elevator or stairs down to the ground floor. They then walk either all the way to school, or to some vehicle - car, city bus, school bus, or subway. When they get to the school building, they go up either by stairs or elevator to the classroom. Several types of variations in this pattern occurred. First, the length of the situation varies. For some families, the walk to school is very short; on the other hand, in a few cases the time it takes to get from home to school is so long that only a portion of the situation was recorded. There is some variety in the mode of transportation, as can be seen from Table 23. There is also some variation in who is present during the sit-uation. In the case of the white middle-class families, there was always at least one parent going with the child on the way to school. The same holds for the black middle-class families, with three exception: Two target children took a school bus to the preschool, and in one case, on one of the two days taped the experimenter took the target child to school. (On the other day, the mother came along.) A parent always went along with the white working-class target children, except in one case where a baby-sitter (an adult female) went along instead of the mother. It was generally the mother who accompanied the target child to school. Three fathers were present among the white middle-class families, four among the black middle-class families, one in the white working-class families, and none in the black working-class families.

In the case of the black working-class families, the experimenter usually walked the target child to school. In two families, the mother did come along, one of them on only one of the two days taped. This is probably largely due to the fact that the black working-class families lived only a short walk away from the preschool.

The speech recorded in the Transition situation consists mostly of conversation between the target child and the accompanying parent. There were several types of exceptions to this generalization:

  1. In those cases where the experimenter alone walked the target child to school, the conversation was between the target child and experimenter.
  2. There were one or two instances where the experimenter got into a conversation with the parent.
  3. There were two children, both black middle-class, who took a school bus to school. In these cases, the conversation was almost entirely among the children in the school bus.
  4. In three cases, the target child and parent were accompanied by others on the way to school. In the case of one white middle-class family, a group of parents and children walked to school together. In one white middle-class family and one black middle-class family, the target child and parent drove to school and took along one or two other mothers with their children.
  5. There is also some conversation with people met along the way-with neighbors and friends around the apartment while leaving, occasionally people greeted on the street, with people in stores, and with others in the school building as they enter and go up to the target child’s classroom.

There are only five cases in which the Transition situation includes a stop at a store-two in the white middle-class, one in the black middle-class, and two in the white working-class. The stop never took much time, nor was there extended conversation between the tar-get child and people in the store.

The topic of conversation was frequently (but not exclusively) about things seen en route-cars, fire engines, garbage trucks, people on the sidewalk, branches left by workmen pruning trees, the weather (it seemed to snow or rain quite a bit on the days when taping was done), and the temperature and mittens or hats the children should have worn. There was also a fair amount of talk about things not present-about scuba diving, projects and field trips at school, episodes of “The Electric Company” (a children’s TV show), and incidents seen in past walks, such as a lady who yelled at a man for not stopping at a stop sign. There was some discipline-oriented speech by the mother (and sometimes the exper-imenter), for example, telling the target child not to walk in puddles. The target children were usually talking with the mother or experimenter, but they also often spent some time singing or talking to themselves. These transcripts were not coded originally in the CHAT format. Please use caution when using the CLAN programs, because it is possible that some divergences from CHAT could lead to inaccuracies in certain analyses. A special code, [*] [new text], is used to indicate any of three types of structures in this corpus:

  1. errors
  2. preferred speech (i.e., standard English for nonstandard forms)
  3. estimated intent
When reformatting the data into CHAT, it was impossible to distinguish these three types of codings from one another. It is clear that many of these notations in the corpus refer to alternatives rather than errors. In addition, the phonological transcriptions have not yet been changed to UNIBET. Overlaps were marked in the original, but the direction of the overlap was not marked. We have used the CHAT symbol [<>] for these overlaps. It means unclear overlap and not “overlap both precedes and follows” in this particular corpus


Kim Roth reformatted this corpus into accord with current versions of CHAT.