Participants: 4 (plus siblings and parents)
Type of Study: naturalistic
Location: Antwerp, Belgium
Media type: audio
DOI: doi:10.21415/T58C8C

Project Description

This corpus of Dutch child language and child-directed speech was collected in Antwerp, Belgium.

The corpus consists of 15 recordings transcribed orthographically and phonetically. Some transcripts also contain variety codes, speaker codes, addressee codes and utterance numbers (see further below). Participants are four children between the ages of ca. 4;9 and 5;0 (two boys Dieter and Michiel, and two girls Kim and Katrien) and their families, with some other persons on occasion present as well. The families are lower-middle to middle-middle class. All children are addressed in some form of Dutch common around the city of Antwerp and go to school fulltime (second year of nursery school). They are being raised monolingually. The interactions are mostly free and spontaneous, but include some structured interactions as well, in which the mother or father had a conversation with the 4-year-old about the past day at school, or prompted the child to describe a picture and tell a picture book story.
FileUtterancesSexAgeBirth Order
KIM Saturday902female4;11.03middle of three
KIM Friday954female4;11.02middle of three
KIM Tuesday382female4;10.30middle of three
DIETER Saturday697male4;11.29older of two
DIETER Tuesday210male4;11.25older of two
DIETER Wednesday457male4;11.26older of two
DIETER Thursday132male4;11.27older of two
KATRIEN Wednesday2156female4;08.25younger of two
KATRIEN SatAft2148female4;08,28younger of two
KATRIEN SatMorning1931female4;08,28younger of two
KATRIEN Tuesday268female4;08.24younger of two
MICHIEL Saturday1037male4;08.22younger of two
MICHIEL Wednesday1062male4;08.26younger of two
MICHIEL Monday1135male4;08.24younger of two
MICHIEL Tuesday131male4;08.25younger of two
The transcripts consist of 13,602 utterances (children and adults combined). Both adult and child utterances were phonetically and orthographically transcribed by three separate coders: the first two made a transcript from scratch, and the third resolved any differences between the two. For each transcript there was at least one coder from the Antwerp area, and one coder not from the Antwerp region. Phonetic transcription was originally carried out in Dutch UNIBET as developed by Steven Gillis, and is fairly narrow, especially as regards vowel sounds. However, prosody was not transcribed. As most recently described in Nuyts (1989), Antwerp vowel phonemes differ quite substantially from standard Dutch phonemes both in their type and in their distribution. The Dutch UNIBET system first used for the phonological transcription could not handle all the phonemes. Rather than develop a new system, approximations were used where necessary, with an explanation in a following %exp line of how a particular phoneme symbol was best interpreted. The UNIBET symbols were converted in Unicode but researchers who prefer to work with the original UNIBET files are welcome to contact the author of the data for more information. Also, there remain 0Xfa symbols in the Unicode for sounds that could not be approximated with the UNIBET symbols. Finally, the files for the child MICHIEL may contain some inaccuracies on the %pho line with regard to the long low open vowel phoneme used in Antwerp renderings of HIJ, MIJN and the like. Researchers wanting to work with these data are welcome to contact the author of the data to resolve these problems. While Dutch standard spelling was generally used, the orthographic transcript stays as close to the phonetic transcript as possible, and indicates missing initial and final sounds between brackets. Where this is not the case, and there seems to be a mismatch between the phonetic and orthographic transcript lines, it is the phonetic line that should be taken as most closely resembling the original utterance. Utterance lines may be followed by comment lines. These are in Dutch. For 10 of the 15 data files there is an additional coding line for each utterance (5 of these are complete and double-checked; the other 5 are provisional). This line includes the following: - an utterance number followed by a slash - a three letter code, where the first letter refers to the speaker, the second letter refers to the kind of Dutch that is being used (variety neutral, or 'local', meaning that the utterance contained a form typical of Antwerp dialect), and the third letter refers to the addressee. More information on these codes can be found in De Houwer, 2003 (reference below), or can be obtained directly from the author of these data at If the coding line indicated that the utterance contained material coded as 'local', an explanation line follows to identify what exactly it was in the utterance that led to that coding decision (e.g., a particular dialect phoneme, use of a dialect pronoun, use of specific dialect vocabulary, etc. - see De Houwer 2003). The data show that the following distinctions in usage emerge: 'local' utterances containing dialect elements tend to be used when older children and adults in the family address each other. 'Neutral' forms that are common all over Flanders may also be used, while 'distal' features, which are clear 'imports' from a Dutch variety outside Flanders are being avoided. However, when older children and adults address the younger members of the family, they increase their use of neutral forms, substantially reduce their use of local forms, and occasionally use distal forms. The younger children use mainly utterances categorized as neutral, dependent on who they are addressing. Implications of this variation across family members for language change are discussed. (Reference: Nuyts, Jan. (1989). Het Antwerps vokaalsysteem: een synchronische en diachronische schets. Taal en tongval 41(1-2): 22-48.)


Transcription and coding of the Antwerp Dutch corpus was made possible through grants to the author from the Belgian Science Foundation and the University of Antwerp.