Elena Lieven MPI-Leipzig Manchester University lieven@eva.mpg.de |
Jeannine Goh Psychology Manchester University jeannine.goh@manchester.ac.uk website |
Participants: | 4 |
Type of Study: | dense sampling |
Location: | England |
Media type: | audio |
DOI: | doi:10.21415/T5DW48 |
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics,20, 3, 481-508.
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The following are articles relevant to the use of the Max Planck dense databases though the data from Gina and Helen has not yet been analysed.
Ibbotson, P., Theakston, A., Freudenthal, D., Lieven, E. & Tomasello, M.(in press). Productivity of noun slots in verb frames. Cognitive Science.
Ibbotson P., Lieven E. and Tomasello M. (2014). The communicative contexts of grammatical aspect use in English. Journal of Child Language 41, 3,705 – 723. DOI: 10.1017/S0305000913000135
Theakston, A. L., Maslen, R., Lieven, E. V. M. & Tomasello, M. (2012). The acquisition of the active transitive construction in English: a detailed case study. Cognitive Linguistics, 23(1), 91-128. eScholarID:130044 | DOI:10.1515/COGL.2012.004
Lieven, E. & Behrens, H. (2012). Dense sampling. In E.Hoff (Ed.) Guide to research methods in child language. Wiley-Blackwell. Pp.226-239
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics,20, 3, 481-508..
Bannard, C., Lieven, E. & Tomasello, M. (2009). Modeling children's early grammatical knowledge, Proceedings of the National Academy of Sciences, 106 (41), 17284-17289
Kirjavainen, M., Theakston, A. & Lieven, E. (2009). Can input explain children's me-for-I errors? Journal of Child Language, 36, 5, 1091-1114
Bannard, C. & Lieven, E.. (2009). Repetition and Reuse in Child Language Learning In Roberta Corrigan, Edith Moravcsik, Hamid Ouali, Kathleen Wheatley (eds.). Formulaic Language: Volume II: Acquisition, Loss, Psychological reality, Functional Explanations. Amsterdam: John Benjamins (pps.297-321).
Kirjavainen, M., Theakston, A. , Lieven, E. & Tomasello, M. (2009). `I want hold Postman Pat': An investigation into the acquisition of infinitival marker `to' . First Language, 29: 313-339.
Pine, J., Conti-Ramsden, G., Joseph, K., Lieven, E. & Serratrice, L. (2008). Tense Over Time: Testing the Agreement/Tense Omission Model as an account of the pattern of tense-marking provision in early child English. Journal of Child Language,35, 55-75.
Theakston, A. & Lieven, E. (2008). The influence of discourse context on children’s provision of auxiliary BE. Journal of Child Language, 35, 129-158.
Bannard, C. & Matthews, D. (2008). Stored word sequences in language learning: The effect of familiarity on children's repetition of four-wrod sequences. Psychological Science, 19(3), 241-248
Dittmar, M., Abbot-Smith, K., Lieven, E. & Tomasello, M. (2008). German children’s comprehension of word order and case marking in causative sentences. Child Development, 79, 1152-1167.
Chang, F., Lieven, E., & Tomasello, M. (2008). Automatic evaluation of syntactic learners in typologically-different languages. Cognitive Systems Research, 9(3), 198-213
Cameron-Faulkner, T., Lieven, E. & Theakston, A. (2007). What part of no do children not understand? A usage-based account of multiword negation, Journal of Child Language, 34, 251-282.
Lieven, E. (2006). Producing multiword utterances. In B. Kelly & E. Clark (eds.) Constructions in Acquisition. Stanford, CA: CSLI Publications, pps. 83-110.
Dąbrowska, E. & Lieven, E. (2005). Towards a lexically specific grammar of children’s question constructions. Cognitive Linguistics, 16, 3, 437-474.
Maslen, R., Theakston, A., Lieven, E. & Tomasello, M. (2004). A Dense Corpus Study of Past Tense and Plural Overregularization in English. Journal of Speech, Language and Hearing Research, 47, 1319-1333
A complete list of all files for all four children is in the Appendix
Eleanor was recorded over a period of two years (2;0.2 to 3;1.17). During the early transcription of the data the pseudo-name of Eleanor was used for anonymity reasons, therefore all the files in the corpus have the extension of .EL. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Aliah) to remain in the files. We have therefore not engaged in the laborious task of removing the name Aliah and changing it to Eleanor in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Aliah, but the name may have been changed to Eleanor within the transcript.
In terms of content, the dataset is split into A recordings and B recordings. The A recordings consist of Aliah and her mother engaging in naturalistic play with static-radio microphones picking up the speech. The B recordings were captured using a wireless microphone attached to a custom-made waistcoat. During the B recordings Aliah could wander around the house and engage with other family members. The mother was asked to not engage in the more intense naturalistic play during the B recordings, but rather let any conversation or speech be led by their child. There are only audio recording and no videos during the B period.
In terms of frequency, this large dataset is best considered in three sections (Sections i, ii, iii). Section i and Section iii comprise a six-week intensive period. During this intensive period, Aliah was recorded for one hour/day five times a week. For the A recordings, one of these recordings each week is a video. During period ii, Aliah was recorded five times a week for one week every month. Period ii is therefore a less intense period. Again, for the A recordings, one of these recordings each week is a video.
This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER, MOR and POST programs.
Section i (Eleanor aged 2-00-01 to 2-01-10) The six-week intensive
period
• Eleanor is recorded for one hour, five times a
week, every week for the entire period. One of each of the five
recordings is a video.
Section ii (Eleanor aged 2-02-01 to 2-11-06) The one week a month
period
• Eleanor is recorded for one hour, one week in every
month. During this week there are five recordings one of which is a
video.
Section iii (Eleanor aged 4-00-02 to 4-11-20) The six-week
intensive period
• Eleanor is recorded for one hour, one
week in every month. During this week there are five recordings one of
which is a video.
Scripts | videos | |
---|---|---|
Section i | 46 | 6 |
Section ii | 87 | 10 |
Section iii | 61 | 5 |
Total | 194 | 21 |
Over the two year period the audio of a total of 194 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the A recordings the microphones were positioned around the downstairs of the house, allowing Eleanor to move freely during his play whilst still capturing his speech. For the B recordings the radio microphone was stitched into a waistcoat unbeknown to Eleanor. This waistcoat recording allowed Eleanor to roam more widely and engage with other family members. For 21 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.
All of the audio recordings took place in Eleanor’s home where she was engaged in regular play activities with his mother. In most of the video recordings the investigator is also present and is engaged in play with Eleanor. The videos were also mainly recorded in Eleanor’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.
The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.
This corpus contains the data from a longitudinal naturalistic study of one child over a period of two years (2;0.0 to 3;1.11). During the early transcription of the data the pseudo-name of Fraser was used for anonymity reasons, therefore all the files in the corpus have the extension of .FR. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Adam) to remain in the files. We have therefore not engaged in the laborious task of removing the name Adam and changing it to Fraser in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Adam, but the name may have been changed to Fraser within the transcript.
In terms of content, the dataset is split into A recordings and B recordings. The A recordings consist of Adam and his mother engaging in naturalistic play with static-radio microphones picking up the speech. The B recordings were captured using a wireless microphone attached to a custom-made waistcoat. During the B recordings Adam could wander around the house and engage with other family members. The mother was asked to not engage in the more intense naturalistic play during the B recordings, but rather let any conversation or speech be led by their child. There are only audio recording and no videos during the B period.
In terms of frequency, this large dataset is best considered in three sections (Sections i, ii, iii). Section i and Section iii comprise a six-week intensive period. During this intensive period, Adam was recorded for one hour/day five times a week. For the A recordings, one of these recordings each week is a video. During period ii, Adam was recorded five times a week for one week every month. Period ii is therefore a less intense period. Again, for the A recordings, one of these recordings each week is a video.
This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER program.
Section i (Fraser aged 2-00-01 to 2-01-10) The six-week intensive
period
• Fraser is recorded for one hour, five times a week,
every week for the entire period. One of each of the five recordings is
a video.
Section ii (Fraser aged 2-02-01 to 2-11-06) The one week a month
period
• Fraser is recorded for one hour, one week in every
month. During this week there are five recordings one of which is a
video.
Section iii (Fraser aged 4-00-02 to 4-11-20) The six-week
intensive period
• Fraser is recorded for one hour, one
week in every month. During this week there are five recordings one of
which is a video.
The number of recordings in each section
Scripts | videos | |
---|---|---|
Section i | 58 | 6 |
Section ii | 99 | 10 |
Section iii | 59 | 5 |
Total | 216 | 21 |
Over the two year period the audio of a total of 216 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the A recordings the microphones were positioned around the downstairs of the house, allowing Fraser to move freely during his play whilst still capturing his speech. For the B recordings the radio microphone was stitched into a waistcoat unbeknown to Fraser. This waistcoat recording allowed Fraser to roam more widely and engage with other family members. For 21 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.
All of the audio recordings took place in Fraser’s home where he was engaged in regular play activities with his mother. In most of the video recordings the investigator is also present and is engaged in play with Fraser. The videos were also mainly recorded in Fraser’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.
The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.
This corpus contains the data from a longitudinal naturalistic study of one child over a period of just over two years (3;0.2 to 5;1.19). During the early transcription of the data the pseudo-name of Helen was used for anonymity reasons, therefore all the files in the corpus have the extension of .HE. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Hannah) to remain in the files. We have therefore not engaged in the laborious task of removing the name Hannah and changing it to Helen in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Hannah, but the name may have been changed to Helen within the transcript.
In terms of frequency, this dataset is best considered in five sections (Sections i, ii, iii, iv, v). Section i, iii and Section v comprise a six-week intensive period at age 3;0.0 and 4;0.0 and 5;0.0 respectively. During this intensive period, HANNAH was recorded for one hour/day five times a week. During each week one of the recordings is a video. During period ii and period iv, HANNAH was recorded five times a week for one week every month. These periods are therefore less intense and again if possible one of these recordings each week was a video.
This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER program, MOR and POST.
Section i (HELEN aged 3-00-01 to 3-01-11) The six-week intensive
period
• HELEN is recorded for one hour, five times a week,
every week for the entire period. One of each of the five recordings is
a video.
Section ii (HELEN aged 3-02-00 to 3-11-10) The one week a month
period
• HELEN is recorded for one hour, one week in every
month. During this week there are five recordings one of which is a
video.
Section iii (HELEN aged 4-00-02 to 4-01-13) The six-week intensive
period
• HELEN is recorded for one hour, five times a week,
every week for the entire period. One of each of the five recordings is
a video.
Section iv (HELEN aged 4-02-01 to 4-11-07) The one week a month
period
• HELEN is recorded for one hour, one week in every
month. During this week there are five recordings one of which is a
video.
Section v (HELEN aged 5-00-00 to 5-01-19) The six-week intensive
period
• HELEN is recorded for one hour, five times a
week, every week for the entire period. One of each of the five
recordings is a video.
The number of recordings in each section
Scripts | videos | |
---|---|---|
Section i | 28 | 5 |
Section ii | 52 | 9 |
Section iii | 29 | 6 |
Section iv | 46 | 9 |
Section v | 28 | 5 |
Total | 184 | 34 |
Over the two years and six weeks the audio of a total of 184 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the recordings the microphones were positioned around the downstairs of the house, allowing HELEN to move freely during her play whilst still capturing her speech. For 34 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.
All of the audio recordings took place in HELEN’s home where she was engaged in regular play activities with her mother. In most of the video recordings the investigator is also present and is engaged in play with HELEN. The videos were also mainly recorded in HELEN’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.
The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.
Note: Compound nouns
We have used the Oxford English Dictionary (OED) to help us make decisions around compound nouns. Generally, if a word is considered to be a single word in the OED, then the word in this corpus will be a single word. For example, the word pushchair is a single word in the OED and will therefore appear in the corpus as pushchair. In the OED if two words are hyphenated then they will be joined with a + in the corpus. For example, the word candy-floss appears in the OED with a hyphen so will be coded as candy+floss in the corpus.
This corpus contains the data from a longitudinal naturalistic study of one child over a period of just over two years (3;0.1 to 4;7.29). During the early transcription of the data the pseudo-name of Gina was used for anonymity reasons, therefore all the files in the corpus have the extension of .G. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Rubi) to remain in the files. We have therefore not engaged in the laborious task of removing the name Rubi and changing it to Gina in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Rubi, but the name may have been changed to Gina within the transcript.
In terms of frequency, this dataset is best considered in four sections (Sections i, ii, iii, iv). Section i and Section iii comprise a six-week intensive period at age 3;0.0 and 4;0.0 During this intensive period, RUBI was recorded for one hour/day five times a week. During each week one of the recordings is a video. During period ii and period iv RUBI was recorded five times a week for one week every month. These periods are therefore less intense and if possible one of these recordings each week was a video. Unfortunately in period iv due to family circumstances the dataset is fairly scant
This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER program, MOR and POST.
Section i (GINA aged 3-00-01 to 3-01-11) The six-week intensive
period
• GINA is recorded for one hour, five times a week,
every week for the entire period. One of each of the five recordings is
a video.
Section ii (GINA aged 3-02-00 to 3-11-06) The one week a month
period
• GINA is recorded for one hour, one week in every
month. During this week there are five recordings one of which is a
video.
Section iii (GINA aged 4-00-02 to 4-01-11) The six-week intensive
period
• GINA is recorded for one hour, five times a week,
every week for the entire period. One of each of the five recordings is
a video.
Section iv (GINA aged 4-02-29 to 4-07-29) The one week a month
period
• GINA is recorded for one hour, one week in every
month. During this week there should be five recordings and one video
but due to family circumstances it was difficult
The number of recordings in each section
Scripts | videos | |
---|---|---|
Section i | 30 | 6 |
Section ii | 40 | 8 |
Section iii | 29 | 5 |
Section iv | 19 | 4 |
Total | 118 | 23 |
A detailed inventory of the data and dates can be found in Appendix 1. Over the two year period the audio of a total of 118 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the recordings the microphones were positioned around the downstairs of the house, allowing GINA to move freely during her play whilst still capturing her speech. For 23 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.
All of the audio recordings took place in GINA’s home where she was engaged in regular play activities with her mother. In most of the video recordings the investigator is also present and is engaged in play with GINA. The videos were also mainly recorded in GINA’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.
The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.
Funding was supplied by these sources: The Department of Comparative and Developmental Psychology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.