GNP (Genesee-Nicolaidis-Paradis) Bilingual Corpus


Fred Genesee
Department of Linguistics
McGill University

website

Elena Nicoladis
Department of Linguistics
University of Alberta

website

Johanne Paradis
Department of Linguistics
University of Alberta

website

Participants: 5
Type of Study: naturalistic
Location: Canada
Media type: video
DOI: doi:10.21415/T5R61B

Browsable transcripts

Download transcripts

Media folder

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The children and their families all lived in Montreal or surrounding communities. Metropolitan Montreal includes a population of approximately 2.5 million people. It is a bilingual community in which many individuals are bilingual in French and English and use both languages on a daily basis. Moreover, evidence of French and English are evident in the media (there are French and English TV stations, newspapers, magazines, etc), on the street (in the form of signs and announcements), and in stores (most store personnel in medium to large stores can provide service in English and French). It is common to hear English and French being spoken by people on the street, in buses, stores, etc.

The children were being raised in homes where both languages were used on a regular basis, usually each language was spoken predominantly by one parent and the other language by the other parent. The children and their parents were recorded in the children's homes -- often in the living room, playroom, or kitchen. The recordings were done by an assistant or graduate student who was otherwise uninvolved in the interactions. The parents were asked to interact and talk with their children as they normally would using whichever language(s) they would normally use and to ignore the assistant as much as possible.

Twenty to thirty minutes of each session with each child were transcribed using the CHAT transcription system. Transcription began after the first five minutes of the session, in order to allow the children to become comfortable with the taping equipment. In cases where the children had produced less than 100 intelligible utterances in the initial 20 minutes of data, transcription continued until at least 100 intelligible utterances had been recorded. Each utterance was coded according to addressee (parent, other -- e.g., toy dog) and the language of the utterance (French only, English only, mixed, neutral, unintelligible. Mixed utterances consisted of utterances that contained both English and French -- for example, the utterance, “ça go pas là” (that doesn’t go there) was considered an instance of intra-utterance mixing. A neutral utterance was one that could belong to either language, such as proper names, “ah” and “oh”. Animal sounds that are similar in English and French (i.e. “meow”) and the word “okay” were also coded as neutral, as it is impossible to determine the language in which these words were being produced. However, when a neutral word appeared in an utterance of only one language, the entire utterance was coded as being in that language. For instance, the utterance “oh a truck” would be coded as English, whereas the sentence “oh un camion” would be coded as French. Finally, utterances that were incomprehensible were classified as unintelligible; these were sometimes transcribed phonetically but no orthographic transcription was possible, and they were often dropped from further analyses. All transcripts were reviewed by one of two bilingual assistants who was a native speaker of the primary language of the session. Any discrepancies were resolved by discussion. The Social Sciences and Humanities Research Council, Ottawa, funded this work through a grant to Fred Genesee.

More detailed coding and analyses were undertaken in accordance with the specific objectives of the research. The following is a list of research that is based in part of in whole on these transcriptions. The following publications were based, in part, on these transcripts: