ArabicDutch-AarssenBos Corpus


Jeroen Aarssen
SARDES, Unit Taal


website

Petra Bos

Free University of Amsterdam

website

Participants: 175 transcripts from children ages 4, 5, 7, 9, 10
Type of Study: narrative
Location: the Netherlands
Media type: audio
DOI: doi:10.21415/T5CG7Z

Browsable transcripts

Download transcripts

Media folder

Citation information

This research resulted into the two doctoral theses cited below.

Publications using these data should cite:

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This database contains 1021 transcripts collected in the Netherlands, Turkey, and Morocco by Jeroen Aarssen and Petra Bos, at Tilburg University. Bilingual data (either Turkish-Dutch or Moroccan Arabic-Dutch) were collected within the framework of a longitudinal study into development of bilingualism among Turkish and Moroccan children in the Netherlands.

Beginning in 2003, work began in two sites on digitizing the audio files for this corpus. The Turkish data were being digitized by Ad Backus (a.m.backus@uvt.nl) and the Moroccan data were being digitized by Louis Boumans at Nijmegen University.

The age range of the bilingual informants was from 4 to 10. The design of the study is pseudo-longitudinal with two consecutive cohorts of 25 informants. The younger cohort was followed for four rounds (from age 4 to age 7) and the older cohort for three rounds (from 8 to 10). The first round of data collection took place in 1991, and data collection was repeated in 1992, 1993, and 1994. The interval between subsequent rounds of data collection was about 1 year.

Turkish, Moroccan Arabic, and Dutch monolingual control data were collected as well in Turkey, Morocco, and The Netherlands, respectively. The Dutch control data were collected according to the same pseudo-longitudinal design as described above. The Turkish and Moroccan control data, however, were collected cross-sectionally from three different age groups (ages 5, 7, and 9).

Each transcript contains retellings of six short six-picture stories and the frog story (Mayer, 1969). The six short stories were constructed according to the following set-up: two stories with a clearly identifiable main character; two with two equivalent main char-acters; and two without a clearly identifiable main character.

The file names use the following code. First comes the child’s pseudonym. Next comes a number for the child’s age group. These numbers are often off by a year, so please rely on the ages as given inside the files. Then comes a letter for the language of the interaction (t=Turkish, m=Moroccan Arabic, n=Dutch). For the monolingual children, no letter is given. The files are structured into five directories:

ArabBiling:350 files, ages 4–10
ArabMono:71 files, ages 5, 7, 9
DutchMono:175 files, ages 4–10
TurkBiling:350 files, ages 4–10
TurkMono:75 files, ages 5, 7, 9

Some adjustments were made in order to represent some Moroccan Arabic characters in Roman:

Character  Code  Character  Code  Unicode
ch of "loch"xpharyngealized hh2bar h
jjemphatic tt2t glottal
uvular rghemphatic ss2s glottal
ayncemphatic dd2d glottal
Please see this link for a general description of the Frog Story methods.

Acknowledgements

This research was supported by the Linguistic Research Foundation (Grant No. 300-172-002), which is funded by the Netherlands Organization for Scientific Research, NWO.