CHILDES Dutch Asymmetries Corpus

Petra Hendriks
University of Groningen


Charlotte Koster
University of Groningen


Sanne Kuijper
Social Sciences
University of Groningen


Participants: CK: 31 4;3-6;5 SK: 121 6;1-12;10
Type of Study: structured storytelling
Location: the Netherlands
Media type: audio, adult data in CABank
DOI: doi:10.21415/T5SW2X

Browsable transcripts

Download transcripts

Link to media folder

Note: SK corpus data from children with ASD are currently indexed at this page.

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Asymmetries Project collection contains Dutch language productions gathered in Groningen and neighboring towns in the northern Netherlands, between 2007 and 2012. The research was carried out by members of the NWO/Vici project “Asymmetries in Grammar” at the University of Groningen. This project investigates asymmetries between production and comprehension in unimpaired children, in young and elderly adults, and in autistic and ADHD children and adolescents. Here is the project website.


All participants are native Dutch speakers. The participants in the CK sub-corpus have no history of language problems. The CK sub-corpus includes 31 typically developing children (4;3-6;5, mean 5;6), 20 young adults (18-35, mean 26;2), and 20 elderly adults (69-87, mean 78;8). The groups are balanced for sex. The SK sub-corpus includes 46 children with Autistic Spectrum Disorder (ASD) (6;1–12;10, mean 9;3), 37 children with Attention Deficit Hyperactivity Disorder (ADHD) (6;1–11;11, mean 8;9), and 38 typically developing children (TD) (6;2–12;1, mean 9;0). The majority of children in the SK sub-corpus (ASD: 87%, ADHD: 84%, TD: 66%) are boys.


CK sub-corpus

The children in the CK sub-corpus were individually tested in a quiet room at their primary school in Stadskanaal (45 km southeast of Groningen). Testing of the children took place in the winter of 2007-2008. In the spring of 2010, the two adult groups participated in the same experiment. The young adults, both students and non-students, were tested in their homes or at the university. The elderly adults were all tested in their homes. The elderly were socioeconomically representative of their generation’s middle class and all still lived independently, with a minimum of assistance. All the adults lived in the greater Groningen area. Transcripts of the two adult groups can be found at

The language production data of the CK sub-corpus was collected in a structured storytelling task, which was the first part of a larger experimental session including a memory test and a language comprehension test. A child was first shown one introductory page including pictures of all the storybook figures, such as a princess, witch, ballerina, nurse, pirate, knight, cowboy, Indian, etc. The figures were depicted in stereotypical color drawings. The child was asked to name the figures and was helped if needed. Then the child was told that she would see picture books and should tell what was happening on the pages of the books. Investigator-1 explained that investigator-2 wanted to listen in, but because she was sitting further away behind a computer screen, she couldn’t see the pictures. So the child should explain as clearly as possible what was happening in the picture books. After a practice session, the child saw a picture book with six pictures, one per page (see storybook pictures below). She described the activity on each picture-page as she looked at it and could see only one picture-page at a time. The child saw four picture books in total. After each picture book was completed, the child was rewarded with a sticker and intermittently reminded that investigator-2 couldn’t see the pictures. The child’s descriptions were basically monologues. If necessary, investigator-1 prompted the child. Investigator-1 often gave slight encouragements while turning the page, such as “yes” or “good job” or some other short, “empty” supporting comment. The total production session usually took between 7 - 15 minutes. This time includes the instructions, introductory page, practice stories, four test stories and rewards between stories. The children told all four stories within about 3 – 7 minutes (including reward-time between stories). Adults were quite efficient, with a total storytelling time of about 3 – 5 minutes. The elderly talked more, or more slowly and completed the four stories in about 5 – 10 minutes.

Experimentation with the two groups of adult participants followed the same procedure, with a few minor differences. The adults got no “rewards” between storybooks, although they were given a small present of chocolates at the end of their session. They were tested by only one investigator, who explained that the tapes would be further processed by someone else. They were told that this second investigator would not see the pictures, so the participants must be clear about what was happening. The adults were also asked to limit the length of their answers.

SK sub-corpus

Note: SK corpus data from children with ASD are currently indexed at this page.

The language production data of the SK sub-corpus was collected in a structured storytelling task (the same task as administered to the participants in the CK sub-corpus, and following the same procedure as with the children in the CK sub-corpus). The task was part of a larger experimental session that took several hours. All children in the SK sub-corpus were individually tested at the Eye Lab at the Faculty of Arts of the University of Groningen. Testing of the children took place between May 2009 and August 2011.

To confirm ASD diagnosis of the children with a clinical diagnosis of ASD according to DSM-IV-TR guidelines, the Autism Diagnostic Interview Revised (ADI-R) (Rutter, Le Couteur, & Lord, 2003) and the Autism Diagnostic Observation Schema (ADOS) (Lord, Rutter, DiLavore, & Risi, 1999) were administered to all children participating in the study (including the TD and ADHD children) or their parents. Likewise, to confirm ADHD diagnosis of the children with a clinical diagnosis of ADHD according to DSM-IV-TR guidelines, the Parent Interview for Child Symptoms (PICS) (Ickowicz et al., 2006) and the Teacher Telephone Interview-IV (TTI-IV) (Tannock, Hum, Masellis, Humphries, & Schachar, 2002) were administered to all children participating in the study (including the TD and ASD children) or their parents.

Each child was tested individually on a single day, with two experimenters present. In addition to the structured storytelling task, children’s comprehension of pronominal binding was assessed with a controlled reference comprehension task. Next, to obtain Theory of Mind, working memory, and response inhibition scores, a verbal and a low-verbal False Belief task (adopted from Hollebrandse, Van Hout, & Hendriks, 2014), an n-back task (cf. Owen, McMillan, Laird, & Bullmore, 2005) and a stop-signal task (adopted from Van den Wildenberg & Christoffels, 2010) were used. Furthermore, to obtain an estimation of children’s IQ and children’s general verbal ability, two subtasks (Vocabulary and Block Design) of the WISC-III-NL (Kort et al., 2002) and the PPVT-III-NL (Schlichting, 2005) were administered.

Children visited the lab together with their parents. While the children were tested, their parents were interviewed in an adjacent room (ADI-R and PICS). Furthermore, parents filled in the Dutch version of the Children’s Communication Checklist (CCC-2-NL) (Bishop, 2003; Dutch translation: Geurts, 2007).

Due to problems with the voice recorder, stories of two children with ASD were not recorded. Furthermore, only two of the four stories of two children (1 ASD and 1 TD) and three out of four stories of one child (ADHD) were recorded. Stories that were not recorded are left out of the Asymmetries Corpus.

Transcriptions, recordings, and use of data

The recorded sessions were first transcribed in Word by the investigator who was present at the test session. All participant productions and investigator productions (above a whisper) were transcribed. For the children, their non-storytelling productions during the reward moments between the four picture books were not transcribed. Then the transcripts were coded in CHAT. A different investigator again controlled the CHAT files and sound files for coding consistency. Productions relating to each picture of each story were transcribed orthographically on the main tier and separated by gems (@G). Occasional use of a second tier was limited to %com: for exceptional situations. Pauses were noted as (.) for a clear pause shorter than 2 seconds, (..) for 2-4 seconds, and (…) for more than 4 seconds. Researchers specifically interested in pauses, stuttering, mispronunciations, contrastive stress should consult the sound files for exact measurements.

The participants are registered anonymously under number, as can be seen on the first initial header line (see example below). For the CK sub-corpus, the 31 children are coded with CKc01 – CKc33. Participants CKc24 and CKc25 have been excluded from the set. The 20 young adults are coded as CKa01 – CKa21. Young adult CKa15 has been excluded. The 20 elderly adults are coded as CKe26 – CKe47. Elderly participants CKe29 and CKe31 have been excluded. For the SK sub-corpus, the 46 children with Autism Spectrum Disorder are coded as SKasd1 – SKasd46. The 37 children with Attention Deficit Hyperactivity Disorder are coded as SKadhd1 – SKadhd37. The 38 typically developing children are coded as SKtd1 – SKtd38.

Several investigators are identified by first name (coincidentally, they often have the same first name!). On the second line, the language, participant’s age, sex and group can be found. The groups are identified as SAchildren, SAadults and SAelderly, with SA standing for Subject Anaphora (see Research Goals below).
Example of the relevant initial header lines:
@Participants: CHI CKc11 Target_Child, INV Sanne3 Investigator
@ID: nld|asymmetry|CHI|6;2.|female|SAchildren| |Target_Child| |
@ID: nld|asymmetry|INV| |female|SAchildren| |Investigator| |

The sound files were recorded on an Olympus voice recorder as Windows Media Audio (WMA) files. The names of the sound files match the transcript file names.

No restrictions are placed on the use of either the transcribed data or the sound files of the children, young adults and elderly adults in the CK sub-corpus or the SK sub-corpus.

Research Goals

These productions are being used in several ongoing investigations. The main research goal for which the picture books were designed was to study subject anaphora in relation to discourse topic and topic shift: specifically, when does a speaker use a full noun phrase and when does she use a pronoun.

Picture Storybooks

The original picture books were six A4 pages long, with one picture on each page, labeled as below. In addition to the four picture books shown below, there were also two short practice picture books (consisting of two pages each) and an introductory page including pictures of all the storybook figures. Whereas the CK sub-corpus used the original Indian story, the SK sub-corpus used a revised version of the Indian story, which has a positive ending that was thought to encourage children to produce a topic shift.

If anyone wishes to use this material, they must first contact Petra Hendriks at for permission and the original files.

Ballerina story

Pirate story

Princess story

Indian story


This project was funded by a grant from the Netherlands Organization for Scientific Research (NWO) awarded to Petra Hendriks (grant no. 277-70-005).