CHILDES Bulgarian LabLing Corpus

Velka Popova
Laboratory of Applied Linguistics
University of Shumen
v.popova@shu.bg
website

Dmitar Popov
Laboratory of Applied Linguistics
University of Shumen
labling@shu.bg
website

Participants:	5, 50, 71
Type of Study:	naturalistic, narrative
Location:	Bulgaria
Media type:	audio
DOI:	doi:10.21415/PHWH-J834

Citation information

Project Description

The main focus of the LabLing research program is the creation of a Bulgarian children's language corpus as part of the CHILDES database. The LabLing is part of the consortium of the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EDU (CLaDA-BG – https://clada-bg.eu/en). The data in particular will be of great importance for the formation and creation of a national interdisciplinary electronic infrastructure in the process of integration and development of electronic resources in Bulgarian. Therefore, the construction of the LabLing CORPUS is a priority task of the consortium CLaDA-BG. The Cyrillic letters Я, Ю, Ъ, Ч, Щ, Ш, Ж, Ц, Й are assigned the following Latin correspondences: Я – ja , Ю – ju , Ъ – y , Ч – ch , Ш – sh , Щ – sht, Ж – zh , Ц – c , Й – j, X - x.

Longitudinal Corpus

The LabLing corpus includes two segments: the longitudinal corpus and the narrative corpus. The longitudinal corpus contains the transcribed data of 5 Bulgarian girls – ALE, TEF, BOG, SIM, and ELI. ALE was born 29-JAN-1989, BOG was born 23-JUN-2000, ELI was born 12-APR-2004, SIM was born 19-DEC-2018, and TEF was born 29-NOV-2000.

The children were born and live in the northeastern part of Bulgaria (Shuman and Varna). They were recorded in common situations (games, when dressing, eating, going to sleep, going through children’s pictorial books, free playing with mother, free playing with father, free playing with other children, reading a book and others) in the process of their daily interaction surrounded by their relatives. All individuals who were signed in the database in their role as participants in dialogues are monolingual native speakers of Bulgarian. The adults in the surroundings have a sufficient level of proper education (either secondary or higher university education). The audio-recordings of two of the children (ALE and TEF) were made by the researchers team of LabLing and those of of BOG, SIM, and ELI – by their mothers. The digitization and transcription of the material is done by the participants in the research team.

Narrative Corpus

The narrative corpus consists of two segments. The first uses the fox and cat stories and the second uses the birds and dogs stories.

Fox-Cat Collection

The fox-cat collection contains 91 transcripts of children`s narratives extracted from 50 monolingual children (native speakers of Bulgarian). They were recorded using a recorder in several kindergartens in Shumen and Varna (north-eastern Bulgaria), in only a few separate cases - at home or in the street. The children are grouped into 3 age groups:

The first group includes 21 children aged 3-4 years – 36 narratives (21 of which without audio, 15 with both audio and transcripts)
The second group includes 23 children aged 4-5 years - 43 narratives (10 of which without audio, 33 with both audio and transcripts);
The third group includes 6 children aged 5-6 years - 12 narratives (with both audio and transcripts).

The corpus has as its basis 2 pictorial stories, each of which contains 6 black-and-white illustrations without text. Namely, the Cat Story (Hickmann 2002) and the Fox Story (developed by the research team of the ZAS-Berlin headed by D. Bittner and first published in Gülzow & Gagarina 2007). Future work will use the the Baby Birds Story and the Dogs Story from the ZAS MAIN study in the CHILDES Biling folder.

Dog-Birds Collection

The second collection uses the the Baby Birds Story and the Dogs Story from the ZAS MAIN study. It contains narratives from 71 children. They were recorded using a recorder in the kindergarten and at home, in the street in Shumen, Razgrad, Varna, Loznitsa, Burgas. The children are grouped into 5 age groups:

The first group includes 6 children aged 3-4 years – 12 narratives (with both audio and transcripts);
The second group includes 4 children aged 4-5 years - 8 narratives (with both audio and transcripts);
The third group includes 21 children aged 5-6 years – 42 narratives (with both audio and transcripts);
The fourth group includes 27 children aged 6-7 years – 54 narratives (with both audio and transcripts);
The fifth group includes 13 children aged 7-8 years – 26 narratives (1 of which without audio, 25 with both audio and transcripts).