CHILDES English-L2 Paradis Corpus

Johanne Paradis
Department of Linguistics
University of Alberta


Participants: 25
Type of Study: longitudinal, naturalistic
Location: Canada
Media type: not available
DOI: doi:10.21415/T5XK6K

Browsable transcripts

Download transcripts

Citation information

Publications based on the use of this corpus should include a citation of this reference:

Other references include:

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Paradis corpus (Paradis, 2005) consists of naturalistic language samples from 25 children learning English as a second language (English language learners or learners of English as an additional language). The corpus is longitudinal, with five rounds of data collection spanning a two-­‐year period.

Transcription is in English orthography only; phonetic transcription was not included in this research. The design of the study was longitudinal, but only 19 of the 25 participants have data for all five rounds. Any real names of people or places in the transcripts have been replaced with pseudonyms. The participants are identified with four letter codes.


The data in this corpus were collected from 2002 to 2005 in Edmonton, Canada. The goal of the project was to collect longitudinal data from children learning English as second language in order to determine the similarities and differences between these children’s acquisition patterns and those of monolingual English-­‐speaking children, with typical development and with specific language impairment. Data for this research project consisted of a battery of standardized language assessments in addition to naturalistic language samples. The transcripts of the language samples constitute the Paradis corpus donated to CHILDES.

Children were video-­‐taped in conversation with a student research assistant in their homes for approximately 45 minutes. During this time, the research assistant had a list of “interview” questions to ask (see 8 below). The aim of having these questions was to get the conversation going and to introduce new topics, if necessary. If the child introduced his or her own topics and the conversation moved forward, the questions were not asked. Therefore, not all children have answers to all the questions at each round of data collection. The video-­‐tapes were captured on Macintosh computers using Final Cut Pro or imovie, and were then transcribed in CHAT using the CLAN software. Only the speaker tiers and necessary dependent tiers, such as %situation and %comment, are included in the transcripts for CHILDES.

Two or three language samples at each of the five rounds were selected to be transcribed independently by a separate research assistant (total = 18). The original and second transcripts were compared on the basis of word-­‐for-­‐word and utterance-­‐boundary agreement between the transcriptions. The mean percentage of word agreement between the transcripts was 95.85% . The mean percentage of utterance boundary agreement was 83.13%. Because of the lower mean for utterance boundary agreement, transcripts with agreement percentages below the mean were examined by the two assistants together, and final decisions about utterance boundaries were made by consensus. Furthermore, some systematic differences in determination of utterance boundaries were documented through this process, and one research assistant reviewed all the transcripts in the corpus and made changes to improve consistency across them.

Biographical data

Participants in this study were children from newcomer (immigrant and refugee) families to Canada. The children started to learn English as a second language (L2) after their first language (L1) had been established, at 4;11 on average (range = 3;3-­‐ 7;5). Thus, these children are sequential and not simultaneous bilinguals. All the parents of the children were foreign born, but some of the children were Canadian born. The Canadian-­‐born children, according to parent report, were functionally monolingual in their L1 until they entered an English language preschool or school program. In the table below, “AOA” refers to the “age of arrival” of the child when the family immigrated. The number “1” indicates children who were Canadian born. The column “AOE” refers to the age of onset of English acquisition. All ages are in months. Each child’ s L1 and gender is also listed in the table below.

When the study started, the children were, on average, 5;6 years old with a mean of 9.5 months of exposure to their English L2 in a preschool or school program. Data was collected approximately every 6 months for 5 rounds. Children’s ages and length of exposure to English in months are given in the table below for each round of data collection.

For more information about the participants and procedures in this research, see the following:

List of questions used in the conversations with the children

NB: Not all children were asked all the questions.
1. How old are you?
2. When is your birthday? (If child doesn’t know, ask them what time of year).
3. Did you / are you going to have a birthday party?
4. What happens at a birthday party?
5. Do you go to school?
6. If yes, what grade are you in? Who’s your teacher?
7. What do you do at school?
8. What do you do at recess?
9. What’s your favorite subject? Why?
10. What are the other kids at school like? (Tell me about the kids in your class)
11. Did you live somewhere else before you moved to Edmonton/Canada? If yes, 
can you tell me about it?
12. What is your favorite food? Can you tell me how to make it? (if no: What food 
do you know how to make? Do you know how to make a sandwich?)
13. Do you know what Halloween (or closest holiday) is? What are you going to 
be/were you for Halloween? What are you going to/did you do?
14. What would you like to be when you grow up? Why? Tell me what you’re 
going to do when you’re a ____________________.
15. Do you know what a fairy godmother is? What three things would you wish 
for if you had a fairy godmother? Why?
16. What games and toys do you like the best? Why? Tell me how to play 
17. What was the last movie/video/TV program that you saw? Tell me what 
18. What did you do on the weekend/ yesterday after school?
19. What are you going to do tonight? What are you going to do tomorrow after 
20. Do you know what the four seasons are? What’s your favourite season? Why? 
What can/can’t you do in that season?


The Social Sciences and Humanities Research Council of Canada and the Alberta Heritage Foundation for Medical Research funded the project.