Anna Chromá Faculty of Arts Charles University, Prague anna.chroma@ff.cuni.cz |
Klára Matiasovitsová Faculty of Arts Charles University, Prague klara.matiasovitsova@ff.cuni.cz |
||
Jakub Sláma Czech Language Institute Czech Academy of Sciences slama@ujc.cas.cz |
Jolana Treichelová Faculty of Arts Charles University, Prague jolana.treichelova@ff.cuni.cz |
||
Filip Smolik Institute of Psychology, Czech Academy of Sciences & Faculty of Arts Charles University, Prague smolik@praha.psu.cas.cz |
Participants: | 7 |
Type of Study: | longitudinal |
Location: | Czech Republic |
Media type: | audio, not available |
DOI: | doi:10.21415/3ZNE-HX03 |
Chromá, A., Sláma, J., Matiasovitsová, K., & Treichelová, J. (under review). A morphologically annotated longitudinal corpus of spoken Czech child-adult interactions. Language Resources and Evaluation.
This is a corpus of transcribed spontaneous child-adult interactions in Czech. It consists of 99,358 tokens in 41,585 utterances produced by seven children between ca 1.5 to 3.5 years of age, and 238,073 tokens in 60,734 utterances produced by their close caregivers in everyday situations at home. The corpus covers language production of the children from the mean length of 1.01 word per utterance up to 5.33 words per utterance. The length of the recorded period ranges for individual children from 11 to 27 months. The transcripts of both child and adult utterances were lemmatized and tagged using MorphoDiTa, a tool for automatic morphological analysis of Czech. The annotation was transformed into the MOR format. Details on procedure, participants, and morphological annotation are to find in Chromá et al. (under review) (see above) and at the homepage of the CoCzeFLA group and at this page.
Scholarships from Faculty of Arts, Charles University were drawn for transcribing and proofreading students already at the very beginning of CoCzeFLA in the years 2014–2015. In these years, the project was supported within the AKCES framework (initiated by prof. Karel Šebesta). Subsequently, Anna Chromá’s project ‘Longitudinální korpus raného vývoje řeči’ (No. FF_VG_2016_16) received faculty support for the next two years. Between the years 2016–2017, this project provided financial rewards for the coordinator – Anna Chromá – and both transcribing and proofreading students. Thanks to this support, a substantial part of the first version of the Chroma corpus (2019.07) was created.
After the end of this project, CoCzeFLA continued to draw scholarships from Faculty of Arts for transcribing and proofreading students within the large infrastructure project LINDAT/CLARIAH-CZ (No. LM2023062, earlier LM2018101) funded by the Ministry of Education, Youth and Sports of the Czech Republic. Since 2021, Anna Chromá has been employed as a data curator at Faculty of Arts within LINDAT.
In the years 2021–2023, the project of Klára Matiasovitsová ‘Nominal morphological categories and the mean length of utterance in a longitudinal corpus of early language development’ was funded from university sources invested into the program ‘Grant Schemes at Charles University’ (CZ.02.2.69/0.0/0.0/19_073/0016935). The morphologically annotated version of the Chroma corpus (2023.07) was created thanks to this support.
We are grateful to the following students who participated in the transcription of recordings, the revision of transcripts, and the manual control of the automatic morphological annotation (in alphabetical order): Markéta Baslová, Kateřina Bělehrádková, Tereza Binderová, Barbora Blahnová, Iurii Bochkov, Jan Henyš, Alžběta Macháčková, Anna Marklová, Martin Pavlíček, Jan Pinc, Tereza Šátavová, Denisa Šebestová, Jana Segi Lukavská, Kateřina Šimková, Leona Straková, Tomáš Treichel, Štěpánka Tvrdíková, and Martina Vokáčová. We also thank our collaborator Petra Čechová who participated in the process of morphological annotation, and our mentor, Filip Smolík, whose contribution to the entire project has been invaluable.