CHILDES English Braunwald Corpus

Susan R. Braunwald, PhD
19191 Harvard Avenue #102 E
Irvine, CA 92612-4653

Participants: 1
Type of Study: case study
Location: USA
Media type: audio
DOI: doi:10.21415/T5D89Z

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Publications using these data should cite any of the references below.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

The Braunwald Corpus (Journals 2-8) can now be linked to The Susan R. Braunwald Language Acquisition Diaries (2015) here. This version is a redacted pdf of the original handwritten diary that contains Journals 1 and 9 and has been archived until 2071. The content of the numbered entries varies as a function of L’s development, but the basic format is a speech event and a description of the situational context. The subsequent dated annotations on the pdf pages are formatted to preserve the integrity of the original diary data exactly as they were entered. There are two informational guides to the database: 1) an Introduction to the Collection, a concise description of the scope and content of the data, and 2) an Introductory Volume, an extensive and varied source of the information that a parent-diarist knows.

Project Description

Susan R. Braunwald—Computation of Language Laboratory Department of Cognitive Sciences 3151 Social Science Plaza University of California Irvine, CA 92697-5100 As the name of this corpus implies, the Braunwald-Max Planck contribution to CHILDES represents a collaborative endeavor between Susan R. Braunwald, the parent-diarist who collected the data, and researchers at The Max Planck Institute of Evolutionary Anthropology (MPI-EVAN), who transcribed, proofread, and created the readme files of the electronic version of the diary study on Laura Braunwald (henceforth L) which includes a handwritten daily diary and audio recordings. First, I will discuss the purpose and scope of the original diary study. Then, I will describe the subset of the data included in the Braunwald-Max Planck corpus. The full set of diary notes is archived in PDF format at the UC Irvine Library and can be downloaded from this link:

The Original Diary Study

Goal of the Diary Study: The theoretical purpose of the diary study on L was to address the following question: How do children acquire the ability to speak a native language? My goal was to document the overt process of the same child’s language acquisition in a naturalistic real-world environment and in relation to the developmental sea change between late infancy and early childhood. I hoped that a careful, ecologically valid longitudinal description of L’s language production would suggest interesting hypotheses about the longitudinal progression of a covert mental process, and, thereby, provide some insight into the organization of the biological substrate of the human ability to acquire language. I kept the diary study on L from 1971 to 1975 in order to investigate the same theoretical question that Deb Roy at M.I.T. addressed in the continuous video sample of his son’s language acquisition at home. Although none of today’s advanced technology existed in 1971, the theoretical reason why I created the diary study on L was fundamentally comparable to Roy’s collection and analysis of a massive technologically sophisticated longitudinal study on a single subject. In fact, there was even one potentially interesting advantage to the technological simplicity of a handwritten diary. I was able to keep a record of L’s language wherever and whenever I noticed an example that met the criteria for entering data into the diary. As a result, the handwritten diary on L contains examples of her emergent language in many different cultural contexts outside of her immediate home environment. Scope of the Diary Study: I kept systematic observations of L’s communicative behaviors during her late infancy and daily diary entries between her first and her fourth birthdays. The entire diary study consists of 9 volumes of handwritten diary entries and approximately 60 hours of audio recordings. As an experienced parent-diarist, I was aware of the potential methodological shortcomings of handwritten diary data. I, therefore, carefully planned the diary study on L to compensate for these problems. (See Braunwald & Brislin, 1979a, for a complete methodological description of the diary study on L.)

The Handwritten Diary

The criteria for entering the handwritten diary data changed longitudinally as a function of L’s language development. Between L, aged 0;8.0 and 2;03.21, emergent language was the only criterion for entering data into the diary. I used the term language amorphously as shorthand for any observable form of behavior that struck me as intentional and language-like in relation to its context of use. This criterion cast a wide net that enriched the description of a process that included false starts, informative errors, idiosyncratic strategies, and developmental shifts in the linguistic interface among pragmatics, semantics and syntax. To summarize, this portion of the diary data is theoretically neutral and free of a priori expectations about either the process of language acquisition or the linguistic organization of early child language. On June 10, 1973, at L, aged 2;03.21, I could no longer keep track of all her emergent language. In order to continue the diary, I narrowed my focus to three general categories that were intended to describe the relation between language and thought from different but complementary perspectives. These categories were causality, time and single sentences that contained two verbs regardless of their linguistic function. I maintained the basic methodological principle of entering emergent examples into the diary. I entered any of L’s recognizable attempts to talk about causality or time either explicitly or implicitly and regardless of the linguistic complexity of her language. I also consciously monitored L’s speech for utterances with two verbs and entered all I noticed into the diary. These new criteria for data entry, while modest in comparison to all emergent language, still cast a wide net. (See Braunwald, 1997 & 1995, for research based on these criteria.)

The Audio-Recorded Data

The audio-recorded samples were a planned and systematic attempt to compensate for the intrinsic methodological problem of verifying the accuracy of a handwritten parental diary. Audiotapes 1 – 18 can be used to check the reliability of my observations in Journals 1 − 6. By about 3, L spoke English fluently, and I found it hard to write down her complex language as well as the many conversational turns in a single speech event. I had to resort to a “catch-as-catch-can” diary record or end the project. As my handwritten entries became less precise and more anecdotal, I made supplementary audio-recorded samples at irregular intervals to illustrate and to augment the content in Journals 7 – 9.

The Control Data

After trying to record tapes in various contexts, I selected mealtimes as a longitudinally stable and recurrent context for the following reasons: 1) the context of a mealtime was basically limited and known; 2) L’s mobility was limited so that she remained within the range of the microphone; 3) the social interaction at a meal required many different linguistic skills 4) the context per force provided examples of L’s participation with different configurations of interlocutors (e.g., from L and her mother alone to social situations with guests present); and 5) the longitudinal advances in L’s language development would affect the quality of her participation. These advantages outweighed the obvious disadvantages of unwanted background noise and cross-conversation which excluded L.

The Supplementary Data

By her third birthday, L’s emergent language became prominent and interesting because it functioned as a social tool that altered the experiential quality of her life. My observations in Journals 7 – 9 were influenced to an unknown degree by my interest in the relation between language and thought. I deliberately recorded samples of L’s language in contexts that elicited egocentric speech as it was described by either Piaget or Vygotsky. I also recorded conversations with me, play with her older sister and mini-experiments in which L was pragmatically superior to a pretend listener. Although the theoretical purpose of these samples was deliberate, they were unplanned insofar as the period of time between them was inconsistent. To summarize, the control data function as a systematic means to compensate for a fundamental methodological criticism of a handwritten diary (Journals 2 – 6 and tapes 1 – 18). Once L mastered sufficient English, the impromptu supplementary tapes illustrate how language made it possible to enrich her understanding of and self-expression in social relationships and to participate in an environment that was created by using the shared knowledge of a common language (Journals 7 - 9 and portions of tapes 19 - 33).

Summary of the Diary Study

The purpose of the diary study on L was to describe the longitudinal process of a single subject’s language acquisition in the naturalistic context of her daily life in a real-world environment. The diary study, which ultimately included 9 volumes of handwritten data and about 60-hours of audio recordings, was the best method available to monitor L’s language development as continuously as possible. The diary study on L situates language acquisition into the realistic developmental context of a child’s life. Consequently, the distinction between decontextualization—the use of known language in a novel linguistic context —and displacement—the symbolic potential of language to create shared knowledge of an otherwise unknowable personal experience or thought—can be studied in the data on L.

The Braunwald-Max Planck Corpus

The Braunwald-Max Planck corpus is a substantial digitized subset of Braunwald’s original diary study on L. This version of the data makes it possible to access the diary study electronically, using programs such as CLAN. Researchers can now use these data to investigate theoretical questions that can be addressed quantitatively with the codes in CHILDES. These data can also be used as input to computational models of a specifically defined process in the acquisition of language.

Electronic version of The Handwritten Diary

This corpus contains Journals 2 – 8 of the daily diary data. It includes 31-months of the daily diary data on L, between 15-and-46 months of age. Each CHAT file contains the separate speech events that were entered in the diary on any given day. Each speech event includes the relevant turns in discourse of the participants who interacted with L. The line % sit links each speech event to its location in the original handwritten diary and to my extensive context notes and any added methodological clarification of an entry. Although this information can be retrieved easily, to do so requires access to the original handwritten diary.

The Transcriptions of the Audio Data

MPI-EVAN digitized and transcribed the majority of my original cassette recordings and several reel-to-reel tapes. The CHAT transcriptions contain all of L’s speech and any speech events in which she participated. These transcriptions constitute an important contribution to the diary study. They complement the handwritten diary data at an important developmental transition when L, at 17 months of age, was just starting a vocabulary spurt. Although I recorded the samples, MPI-EVAN deserves full credit for transcribing these samples into CHAT files and making them available to CHILDES. Moreover, these transcriptions are free of any identifying information other than L’s name. L’s name can be changed easily so that this database can be separated from the handwritten diary and used anonymously.

Summary and Privacy

The Braunwald-Max Planck corpus lacks the detailed information in the original diary study that describes the transitional period from late infancy to the sustained onset of the production of a child language version of English. Nevertheless, this corpus represents a major contribution of a large longitudinal database of daily handwritten diary data and complementary audio recordings from a single subject. This database is exceptionally rich beginning at L, aged 17-months and onward because there are two sources of data that can be cross checked. Any researcher interested in linguistic topics related to the longitudinal development of language production would find valuable information in this corpus. In conclusion, the Braunwald-Max Planck contribution to CHILDES is a valuable corpus of longitudinal data on the process of language development as it was observed in the same child in the actual cultural contexts of a toddler’s life 40-years ago. This corpus contains naturalistic data on a developing child who led an active and real life that included many people as well as some spontaneous family interactions that lacked any social desirability effects. With the exception of child language research, I have always used pseudonyms on topics that come from these data. Moreover, many of the children mentioned in this corpus are now 40-year-old adults. I never requested their parents’ permission to record their language and behavior as data in a diary study. Please respect their privacy and submit any examples from the handwritten diary or from the sound files linked to the transcriptions that you intend to use publically in any form—i.e., a presentation, an online publication or a print publication−to Susan R. Braunwald. Researchers who are interested in the transitional period which is not included in this corpus or who need contextual information from the original diary data can contact Susan R. Braunwald.

Max Planck Transcriptions of the Audio Recordings

Below you will find some additional details relating to the transcriptions of the audio recordings made by the Max Planck team. Please email Elena Lieven ( ) or Jeannine Goh ( ) if you require further clarification. The Laura data consists of 33 audiotapes spanning the age 1-05-09 to 7-00-14. The data starts off quite dense, but between the ages of five and seven there are only four tapes. Laura’s mother often records four or five days on one tape and these segments were split into separate chat files during the transcription process, and were then labeled by the exact age of Laura. The transcript files include an @Comment line that gives the names of the original 33 audiotapes.

Notes about the labeling of the data • If the date is estimated it is recorded. Most of the dates were clear from the recordings and the mother’s notes. • If two recordings are made on the same day they are listed as a and b i.e. 1-06-00a and 1-06-00b • The date could not be estimated for tapes 14:1 and 14:2 which are called 2-03-XXa and 2-03-XXb they are likely to be around 2-03-08 • The date could not be estimated for tape 30 which is called xx-Dec-75 4-10-xx

General notes • Although Laura’s name has been kept in the transcripts any material that could identify the family has been coded, for example, Jwww [% sister]. • Although the best has been done to distinguish Laura from her sister there was the odd occasion when this was difficult. In these instances the utterance was listened to by a second transcriber who checked that the differentiation between Laura and her sister was uniform across all the transcribers’ work. • In some tapes, due to the age of the recordings, the speaking was not clear, or there was too much echo.