CHILDES German Koch Corpus
|
Nikolas Koch
Institut für Deutsch als Fremdsprache
Ludwig-Maximilians-Universität München
koch@daf.lmu.de
website
|
Participants: | 4 |
Type of Study: | longitudinal |
Location: | Germany |
Media type: | audio |
DOI: | doi:10.21415/EBDP-TD19 |
Koch, Nikolas. 2019. Schemata im Erstspracherwerb. Eine Traceback-Studie
für das Deutsche. Berlin/New York: De Gruyter.
Articles using these data must cite this book.
Additional citations:
Koch, Nikolas, Stefan Hartmann & Antje Endesfelder Quick. 2020. The
traceback method and the early constructicon: theoretical and
methodological considerations. Corpus Linguistics and Linguistic Theory.
doi: 10.1515/cllt-2020-0045
Koch, Nikolas, Antje Endesfelder Quick & Stefan Hartmann. 2021.
Individual Differences in Discourse Priming: A Traceback Approach. In:
Belgian Journal of Linguistics 34, 186–198.
doi.org/10.1075/bjl.00045.koc
Hartmann, Stefan, Nikolas Koch & Antje Endesfelder Quick. 2021. The
traceback method in child language acquisition research: identifying
patterns in early speech. Language and Cognition 13(2), 227–253. doi:
10.1017/langcog.2021.1
Project Description
The data comprise four longitudinal child language corpora, three
monolingual and one bilingual, that have been collected at the
Ludwig-Maximilians-University of Munich. Nikolas Koch was responsible
for the coordination of the recordings, the transcription guidelines and
the reliability of the transcriptions. Adriano Sabini, Julia Schiffer,
Katharina Scholtz, and Asude Türkmen assisted with the transcriptions of
the data.
Monolingual German Corpora
Two girls, Marieke and Merit, and one boy, Simon grew up in a German
monolingual family in a medium-sized city in North Rhine-Westphalia,
Germany. Whereas Merit was the only child in her family, Marieke and
Simon had older siblings aged 4 during the time of the recordings. All
children attended a German daycare facility. Merit went three days per
week for six hours each to a small facility. Marieke spent three days
per week for three hours each at the daycare. Simon was at the daycare
center five days per week for three hours each.
For all children, the primary caregiver during the recordings was the
mother. However, both siblings and fathers participated in some of the
recordings. For Merit, there are four recordings which were conducted by
the father alone.
The parents of the three monolingual children have "Fachabitur" or
"Abitur" as their highest school degree. At least one of the parents of
each child has a university or technical college degree. All fathers
were employed full-time during the period of data collection. Merit's
and Simon's mothers worked part-time whereas Marieke's mother was on
parental leave at the time of the recordings. All parents spoke
dialect-free, clearly articulated standard High German. The children can
all be assigned to an educated middle class.
Bilingual German – Turkish Corpus
Jan, a boy, grew up in a bilingual German and Turkish family in a large
city in Bavaria, Germany. The mother was a native speaker of Turkish and
the father was a native speaker of German. Jan was the only child
during the recording time.
The recordings were primarily made by the German-speaking father. The
Turkish-speaking mother and the German-speaking grandparents also took
part in some of the recordings.
The parents of the bilingual child both hold doctoral degrees. Both
parents worked part-time during the recordings. Jan's parents followed a
"one-person-one-language" (OPOL) strategy. Jans' father spoke
dialect-free, clearly articulated standard German. Jan's mother spoke
dialect-free, clearly articulated standard Turkish to him. German served
as the family language for the most part, and English in some cases.
Filenames and Metadata
The file names indicate the children’s age. In the speaker-ID-tiers, the
ages for the children are computed on a daily basis, the ages for the
adults stay the same and give their average age during the recording
period. SES is indicated by the highest degree earned in the German
system (e.g., university = university degree).
Sampling
Data collection for each corpus took place over a period of seven weeks
in the children's home. The recordings were made by their caregivers,
without the project leader being present. Three weeks before the
beginning of the recordings, Marieke, Merit and Simon, the monolingual
children, were tested with a language development test for two-year-old
children (SETK-2; Grimm 2000) in order to rule out a language
developmental disorder. For this purpose, the production and reception
of words and sentences was tested in four subtests. All three children
exceeded the DAWAKRIT value according to their age group and were thus
in the normal range. Marieke's recordings started at the age of 2;02.22
years and ended when Marieke was 2;04.10 years old. Merit was 2;00.21
years old at the beginning of the recording and 2;02.07 years old at the
end of the recordings. Simon's language development was recorded from
2;04.23 years to 2;06.18 years of age. Jans' recordings went from
2;08.01 to 2;09.24. Each week, about five recordings were made in
typical play interactions (painting, doing handicrafts, looking at
picture book) each lasting one hour. No guidelines were given regarding
the use of a particular toy or other material. In some cases, the
recordings were divided into two or three sessions per day. This was
done either because of time constraints or when it became apparent that
it was becoming too exhausting for the children to keep up a
conversation or play situation for a period of an hour. Each of the four
corpora collected thus has a total length of approximately 35 hours.
Indirect input in the form of television or smart phone apps did not
play a role for any of the children, although reading aloud and looking
at books together were regular parts of everyday life.
Transcription guidelines
The recordings were made using a TASCAM DR-100MKII recorder and were
transcribed in Sonic-CHAT format (MacWhinney 2016). The data were
transcribed by a total of 4 research assistants at the
Ludwig-Maximilians-University Munich under the supervision of Nikolas
Koch. For this purpose, a transcription manual for German has first been
compiled by Nikolas Koch based on the CHAT conventions. In several team
meetings transcription guidelines have been discussed and refined. The
reliability of the transcription was checked by initially transcribing
the same passages, comparing them, discussing them, and thus eliminating
differences in the transcription until a high degree of reliability was
achieved. If ambiguities continued to appear during the transcriptions,
they were marked and another person was consulted at a later time. In
general, unclear sections were not listened to more than three to five
times in order to avoid overinterpretation. The %com line was used to
note contextual information or other clues to help classify and
interpret statements that were difficult to understand.