CHILDES German Koch Corpus


Nikolas Koch
Institut für Deutsch als Fremdsprache
Ludwig-Maximilians-Universität München

website

Participants: 4
Type of Study: longitudinal
Location: Germany
Media type: audio
DOI: doi:10.21415/EBDP-TD19

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Koch, Nikolas. 2019. Schemata im Erstspracherwerb. Eine Traceback-Studie für das Deutsche. Berlin/New York: De Gruyter.

Articles using these data must cite this book.

Additional citations:

Koch, Nikolas, Stefan Hartmann & Antje Endesfelder Quick. 2020. The traceback method and the early constructicon: theoretical and methodological considerations. Corpus Linguistics and Linguistic Theory. doi: 10.1515/cllt-2020-0045

Koch, Nikolas, Antje Endesfelder Quick & Stefan Hartmann. 2021. Individual Differences in Discourse Priming: A Traceback Approach. In: Belgian Journal of Linguistics 34, 186–198. doi.org/10.1075/bjl.00045.koc

Hartmann, Stefan, Nikolas Koch & Antje Endesfelder Quick. 2021. The traceback method in child language acquisition research: identifying patterns in early speech. Language and Cognition 13(2), 227–253. doi: 10.1017/langcog.2021.1

Project Description

The data comprise four longitudinal child language corpora, three monolingual and one bilingual, that have been collected at the Ludwig-Maximilians-University of Munich. Nikolas Koch was responsible for the coordination of the recordings, the transcription guidelines and the reliability of the transcriptions. Adriano Sabini, Julia Schiffer, Katharina Scholtz, and Asude Türkmen assisted with the transcriptions of the data.

Monolingual German Corpora

Two girls, Marieke and Merit, and one boy, Simon grew up in a German monolingual family in a medium-sized city in North Rhine-Westphalia, Germany. Whereas Merit was the only child in her family, Marieke and Simon had older siblings aged 4 during the time of the recordings. All children attended a German daycare facility. Merit went three days per week for six hours each to a small facility. Marieke spent three days per week for three hours each at the daycare. Simon was at the daycare center five days per week for three hours each.

For all children, the primary caregiver during the recordings was the mother. However, both siblings and fathers participated in some of the recordings. For Merit, there are four recordings which were conducted by the father alone.

The parents of the three monolingual children have "Fachabitur" or "Abitur" as their highest school degree. At least one of the parents of each child has a university or technical college degree. All fathers were employed full-time during the period of data collection. Merit's and Simon's mothers worked part-time whereas Marieke's mother was on parental leave at the time of the recordings. All parents spoke dialect-free, clearly articulated standard High German. The children can all be assigned to an educated middle class.

Bilingual German – Turkish Corpus

Jan, a boy, grew up in a bilingual German and Turkish family in a large city in Bavaria, Germany. The mother was a native speaker of Turkish and the father was a native speaker of German. Jan was the only child during the recording time.

The recordings were primarily made by the German-speaking father. The Turkish-speaking mother and the German-speaking grandparents also took part in some of the recordings.

The parents of the bilingual child both hold doctoral degrees. Both parents worked part-time during the recordings. Jan's parents followed a "one-person-one-language" (OPOL) strategy. Jans' father spoke dialect-free, clearly articulated standard German. Jan's mother spoke dialect-free, clearly articulated standard Turkish to him. German served as the family language for the most part, and English in some cases.

Filenames and Metadata

The file names indicate the children’s age. In the speaker-ID-tiers, the ages for the children are computed on a daily basis, the ages for the adults stay the same and give their average age during the recording period. SES is indicated by the highest degree earned in the German system (e.g., university = university degree).

Sampling

Data collection for each corpus took place over a period of seven weeks in the children's home. The recordings were made by their caregivers, without the project leader being present. Three weeks before the beginning of the recordings, Marieke, Merit and Simon, the monolingual children, were tested with a language development test for two-year-old children (SETK-2; Grimm 2000) in order to rule out a language developmental disorder. For this purpose, the production and reception of words and sentences was tested in four subtests. All three children exceeded the DAWAKRIT value according to their age group and were thus in the normal range. Marieke's recordings started at the age of 2;02.22 years and ended when Marieke was 2;04.10 years old. Merit was 2;00.21 years old at the beginning of the recording and 2;02.07 years old at the end of the recordings. Simon's language development was recorded from 2;04.23 years to 2;06.18 years of age. Jans' recordings went from 2;08.01 to 2;09.24. Each week, about five recordings were made in typical play interactions (painting, doing handicrafts, looking at picture book) each lasting one hour. No guidelines were given regarding the use of a particular toy or other material. In some cases, the recordings were divided into two or three sessions per day. This was done either because of time constraints or when it became apparent that it was becoming too exhausting for the children to keep up a conversation or play situation for a period of an hour. Each of the four corpora collected thus has a total length of approximately 35 hours. Indirect input in the form of television or smart phone apps did not play a role for any of the children, although reading aloud and looking at books together were regular parts of everyday life.

Transcription guidelines

The recordings were made using a TASCAM DR-100MKII recorder and were transcribed in Sonic-CHAT format (MacWhinney 2016). The data were transcribed by a total of 4 research assistants at the Ludwig-Maximilians-University Munich under the supervision of Nikolas Koch. For this purpose, a transcription manual for German has first been compiled by Nikolas Koch based on the CHAT conventions. In several team meetings transcription guidelines have been discussed and refined. The reliability of the transcription was checked by initially transcribing the same passages, comparing them, discussing them, and thus eliminating differences in the transcription until a high degree of reliability was achieved. If ambiguities continued to appear during the transcriptions, they were marked and another person was consulted at a later time. In general, unclear sections were not listened to more than three to five times in order to avoid overinterpretation. The %com line was used to note contextual information or other clues to help classify and interpret statements that were difficult to understand.