Bulgarian LabLing Corpus


Velka Popova
Laboratory of Applied Linguistics
University of Shumen

website

Dmitar Popov
Laboratory of Applied Linguistics
University of Shumen

website

Participants: 4, 50
Type of Study: naturalistic, narrative
Location: Bulgarian
Media type: audio
DOI: xxx

Browsable transcripts

Download transcripts

Link to media folder

Citation information

--

Project Description

The main focus of the LabLing research program is the creation of a Bulgarian children's language corpus as part of the CHILDES database. The LabLing is part of the consortium of the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EDU (CLaDA-BG – https://clada-bg.eu/en). The data in particular will be of great importance for the formation and creation of a national interdisciplinary electronic infrastructure in the process of integration and development of electronic resources in Bulgarian. Therefore, the construction of the LabLing CORPUS is a priority task of the consortium CLaDA-BG. The Cyrillic letters Я, Ю, Ъ, Ч, Щ, Ш, Ж, Ц, Й are assigned the following Latin correspondences: Я – ja , Ю – ju , Ъ – y , Ч – ch , Ш – sh , Щ – sht, Ж – zh , Ц – c , Й – j, X - x.

Longitudinal Corpus

The LabLing corpus includes two segments: the longitudinal corpus and the narrative corpus. The longitudinal corpus contains the transcribed data of 4 Bulgarian girls – ALE, TEF, BOG, ELI. ALE was born 29-JAN-1989, BOG was born 23-JUN-2000, ELI was born 12-APR-2004, and TEF was born 29-NOV-2000.

The children were born and live in the town of Shumen, in the north-eastern part of Bulgaria. They were recorded in common situations (games, when dressing, eating, going to sleep, going through children’s pictorial books, free playing with mother, free playing with father, free playing with other children, reading a book and others) in the process of their daily interaction surrounded by their relatives. All individuals who were signed in the database in their role as participants in dialogues are monolingual native speakers of Bulgarian. The adults in the surroundings have a sufficient level of proper education (either secondary or higher university education). The audio-recordings of two of the children (ALE and TEF) were made by the researchers team of LabLing and those of of BOG and ELI – by their mothers. The digitization and transcription of the material is done by the participants in the research team.

Narrative Corpus

The collection contains 91 transcripts of children`s narratives extracted from 50 monolingual children (native speakers of Bulgarian). They were recorded using a recorder in several kindergartens in Shumen and Varna (north-eastern Bulgaria), in only a few separate cases - at home or in the street. The children are grouped into 3 age groups: The corpus has as its basis 2 pictorial stories, each of which contains 6 black-and-white illustrations without text. Namely, Cat Story (Hickmann 2002) and Fox Story (developed by the research team of the ZAS-Berlin headed by D. Bittner and first published in Gülzow & Gagarina 2007). Future work will use the Baby Birds and Dogs story from the ZAS MAIN study in the CHILDES Biling folder.