Deuchar Bilingual Corpus

Margaret Deuchar
Department of Linguistics
University of Wales


Participants: 1
Type of Study: case study
Location: England
Media type: audio
DOI: doi:10.21415/T50C7H

Project Description

This corpus is a study of M., a girl born in Brighton, England on 24-JUN-1985. M. lived in Brighton and was an only child during the period under investigation. Her mother, Margaret Deuchar, was the investigator, and is a linguist. Her father is a civil engineer. Her mother was born and brought up in England, speaking English, and learned Spanish in early adulthood. Her English was standard with an RP accent slightly modified by southern English features (most of her childhood was spent in Hampshire). Her father was born in Cuba where he lived until age 7, after which he lived mostly in the Dominican Republic and Panama, most of that time being spent in the latter until early adulthood, when he moved to England. He was brought up by Cuban parents speaking Cuban Spanish; his Spanish was also influenced by that spoken in Panama, where he spent his middle and later childhood. He learned English as a second language, starting in secondary school. From the time of their marriage (four years before M.’s birth) the parents spoke Spanish with one another. During the period of data collection, M. was exposed to Spanish from both parents in the home. She was exposed to English from caretakers in the creche and from her maternal grandmother, who spent one day per week with her. The grandmother spoke standard English with a fairly conservative RP accent. At age 1;3, M. heard, on the average, English 48% of the time, and Spanish 52% of the time (calculated on the basis of 12 waking hours per day, 7 days per week).

The major goal was to determine, by means of a case study of an infant acquiring En-glish and Spanish simultaneously between the ages of 1;3 and 3;3, whether the child had an initial linguistic system which subsequently divided into two, or whether a division corresponding to the two sources of linguistic input could be ascertained from the beginning of linguistic production.

Collection & transcription

Video and audio recordings were made weekly of spontaneous interactions between M. and her Spanish-speaking father on one hand, and M. and her English-speaking grandmother on the other. M.’s mother was also present at some of the recordings in both languages. In addition to the weekly recording, studio-quality audio recordings were made at age 1;11 and monthly from age 2;3 onwards in order to obtain elicited data of sufficient quality for the voicing study. Daily diary records were also kept by M.’s mother when interacting with M. and were supplemented by observations in the creche attended by M.. Most of the recordings took place at home in a rectangular room – half of which was the living room and half, the dining room. There was no partition separating the two areas.

The corpus here represents only a small sample of the recordings made, of which there are in total 95 made with an English-speaking interlocutor, mostly the maternal grandmother, and 125 with a Spanish-speaking interlocutor, mostly the father. These recordings were made weekly over a 2-year period from age 1;3 to 3;3. Many of the recordings have not yet been transcribed; others, although transcribed, do not yet meet the CHAT conventions. Diary and creche records are also not yet available in the CHAT format.

Recorded data were transcribed, using phonetic transcription in PHONASCII for the child utterances. The transcriber was competent in English and Spanish and phonetic transcription, and was trained in the CHAT conventions. Transcriptions were typed directly into computer files while videotapes were viewed and audiotapes listened to. The transcriber operated computer, video recorder and audio recorder at the same time while doing transcription. The %pho tier was the only one recorded for each utterance by M.; other tiers, such as those coding nonverbal or situational information, were included when the transcriber judged that they gave useful additional information. Prosody was not transcribed. In the Spanish transcriptions, each utterance was given a tier with a translation into English. Random spot reliability checks were done. However, these affected only a small portion of the data. The only file checked and corrected in exhaustive detail is 861002eg.cha. No project-specific codes were used.


The labels “s” or “e” in the file names refer to the language spoken by the adult in the re-cording session. Copies of articles that make use of the data should be sent to Margaret Deuchar. This project was supported by grants from the Economic and Social Research Council (ref. no. C00232393) and the British Academy.

