![]() |
Valeriia Lelik Center for Language and Brain HSE University lelikvaleriya@gmail.com |
|
Anastasiya Lopukhina Rastle lab Royal Holloway, University of London nastya.lopukhina@gmail.com |
![]() |
Mariia Diachkova Center for Language and Brain HSE University masha.dyachcova@yandex.ru |
![]() |
Olga Dragoy Institute of Linguistics Russian Academy of Sciences odragoy@hse.ru |
![]() |
Svetlana V. Dorofeeva Center for Language and Brain HSE University sdorofeeva@gmail.com |
![]() |
Irina A. Sekerina Department of Psychology and Ph.D. Program in Linguistics College of Staten Island and The Graduate Center, City University of New York irina.sekerina@csi.cuny.edu |
Participants: | 2 |
Type of Study: | longitudinal |
Location: | Russian Federation |
Media type: | video, audio |
DOI: | doi:10.21415/1XMG-D508 |
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The RusLan-M corpus comprises longitudinal spontaneous speech samples of video recordings from two Russian-speaking monolingual children, a boy Yasha and a girl Tosya. Multimedia sources were anonymized by blurring all faces in the videos, muting or removing personal details (e.g., addresses and last names) in the transcripts, and replacing some videos with audio recordings to protect the participants’ privacy.
Tosya was recorded by her mother and caregiver from the age of 10 months, with the final recording made at 3;10. She lived in Russia and had an older sister. The total duration of Tosya’s recordings was approximately 29 h (21,421 child and 40,811 adult utterances), with approximately 36,275 and 126,944 tokens, respectively, where tokens represent all distinct, non-unique word forms.
Yasha was recorded by his father from the age of 1 year and 5 months, with the final recording made at 3;0. He also lived in Russia and had an older brother. The total duration of the Yasha recordings was approximately 12 h (13,965 child and 12,278 adult utterances), with approximately 27,592 and 41,141 tokens, respectively.
The transcripts of both child and child-directed speech were lemmatized and morphologically annotated using Mystem, an automatic tool for Russian morphological analysis (https://yandex.ru/dev/mystem). Trained research assistants then manually verified the annotations and resolved the cases of homonymy. Annotated tables containing nouns and verbs for both children are available on the OSF page (https://osf.io/6zdkc/).
For each participating child, one of his/her parents gave an informed consent for the use of the data. For the target children and other participants we use their original names. Surnames and addresses are replaced by a code xxx and commented on the %com tier.
We are grateful to the parents and children participating in the study. We thank Irina Korkina, Tatyana Masumi for taking a huge part in transcription of recordings, Anastasiya Sycheva for involving in final preparations of material and checking the transcripts. We thank Pavel Pashentsev and Konstantin Lopukhin for the help with the code creation. We are grateful to the following students and researchers who participated in the transcription of recordings, the revision of transcripts, and the manual control of the automatic morphological annotation (in alphabetical order): Anastasia Andreeva, Angelica Dzhioeva, Anna Elagina, Tatiana Eremicheva, Sofia Geren, Elizaveta Klykova, Maria Kozlova, Polina Kozlova, Anastasia Kudrina, Daria Morozova, Veronika Prigorkina, Nadezhda Psaryova, Ksenia Revak, Ivan Shirokov, Alexandra Trepalenko, Vladislava Staroverova, Julia Vorobyova, Nina Zdorova.