CHILDES English MPI-EVA-Manchester Corpus


Elena Lieven
MPI-Leipzig
Manchester University

Jeannine Goh
Psychology
Manchester University

website

Participants: 4
Type of Study: dense sampling
Location: England
Media type: audio
DOI: doi:10.21415/T5DW48

Browsable transcripts

Download transcripts

Link to media folder

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

The following are articles relevant to the use of the Max Planck dense databases though the data from Gina and Helen has not yet been analysed.

Project Description

A complete list of all files for all four children is in the Appendix

ELEANOR

Eleanor was recorded over a period of two years (2;0.2 to 3;1.17). During the early transcription of the data the pseudo-name of Eleanor was used for anonymity reasons, therefore all the files in the corpus have the extension of .EL. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Aliah) to remain in the files. We have therefore not engaged in the laborious task of removing the name Aliah and changing it to Eleanor in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Aliah, but the name may have been changed to Eleanor within the transcript.

In terms of content, the dataset is split into A recordings and B recordings. The A recordings consist of Aliah and her mother engaging in naturalistic play with static-radio microphones picking up the speech. The B recordings were captured using a wireless microphone attached to a custom-made waistcoat. During the B recordings Aliah could wander around the house and engage with other family members. The mother was asked to not engage in the more intense naturalistic play during the B recordings, but rather let any conversation or speech be led by their child. There are only audio recording and no videos during the B period.

In terms of frequency, this large dataset is best considered in three sections (Sections i, ii, iii). Section i and Section iii comprise a six-week intensive period. During this intensive period, Aliah was recorded for one hour/day five times a week. For the A recordings, one of these recordings each week is a video. During period ii, Aliah was recorded five times a week for one week every month. Period ii is therefore a less intense period. Again, for the A recordings, one of these recordings each week is a video.

This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER, MOR and POST programs.

A summary of the frequency of data

Section i (Eleanor aged 2-00-01 to 2-01-10) The six-week intensive period
• Eleanor is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

Section ii (Eleanor aged 2-02-01 to 2-11-06) The one week a month period
• Eleanor is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.

Section iii (Eleanor aged 4-00-02 to 4-11-20) The six-week intensive period
• Eleanor is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.
   Scriptsvideos
Section i 46 6
Section ii 8710
Section iii 61 5
Total19421

Over the two year period the audio of a total of 194 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the A recordings the microphones were positioned around the downstairs of the house, allowing Eleanor to move freely during his play whilst still capturing his speech. For the B recordings the radio microphone was stitched into a waistcoat unbeknown to Eleanor. This waistcoat recording allowed Eleanor to roam more widely and engage with other family members. For 21 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.

All of the audio recordings took place in Eleanor’s home where she was engaged in regular play activities with his mother. In most of the video recordings the investigator is also present and is engaged in play with Eleanor. The videos were also mainly recorded in Eleanor’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.

The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.

FRASER

This corpus contains the data from a longitudinal naturalistic study of one child over a period of two years (2;0.0 to 3;1.11). During the early transcription of the data the pseudo-name of Fraser was used for anonymity reasons, therefore all the files in the corpus have the extension of .FR. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Adam) to remain in the files. We have therefore not engaged in the laborious task of removing the name Adam and changing it to Fraser in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Adam, but the name may have been changed to Fraser within the transcript.

In terms of content, the dataset is split into A recordings and B recordings. The A recordings consist of Adam and his mother engaging in naturalistic play with static-radio microphones picking up the speech. The B recordings were captured using a wireless microphone attached to a custom-made waistcoat. During the B recordings Adam could wander around the house and engage with other family members. The mother was asked to not engage in the more intense naturalistic play during the B recordings, but rather let any conversation or speech be led by their child. There are only audio recording and no videos during the B period.

In terms of frequency, this large dataset is best considered in three sections (Sections i, ii, iii). Section i and Section iii comprise a six-week intensive period. During this intensive period, Adam was recorded for one hour/day five times a week. For the A recordings, one of these recordings each week is a video. During period ii, Adam was recorded five times a week for one week every month. Period ii is therefore a less intense period. Again, for the A recordings, one of these recordings each week is a video.

This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER program.

A summary of the frequency of data

Section i (Fraser aged 2-00-01 to 2-01-10) The six-week intensive period
• Fraser is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

Section ii (Fraser aged 2-02-01 to 2-11-06) The one week a month period
• Fraser is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.

Section iii (Fraser aged 4-00-02 to 4-11-20) The six-week intensive period
• Fraser is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.

The number of recordings in each section
   Scriptsvideos
Section i 58 6
Section ii 9910
Section iii 59 5
Total21621

Over the two year period the audio of a total of 216 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the A recordings the microphones were positioned around the downstairs of the house, allowing Fraser to move freely during his play whilst still capturing his speech. For the B recordings the radio microphone was stitched into a waistcoat unbeknown to Fraser. This waistcoat recording allowed Fraser to roam more widely and engage with other family members. For 21 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.

All of the audio recordings took place in Fraser’s home where he was engaged in regular play activities with his mother. In most of the video recordings the investigator is also present and is engaged in play with Fraser. The videos were also mainly recorded in Fraser’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.

The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.

HELEN

This corpus contains the data from a longitudinal naturalistic study of one child over a period of just over two years (3;0.2 to 5;1.19). During the early transcription of the data the pseudo-name of Helen was used for anonymity reasons, therefore all the files in the corpus have the extension of .HE. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Hannah) to remain in the files. We have therefore not engaged in the laborious task of removing the name Hannah and changing it to Helen in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Hannah, but the name may have been changed to Helen within the transcript.

In terms of frequency, this dataset is best considered in five sections (Sections i, ii, iii, iv, v). Section i, iii and Section v comprise a six-week intensive period at age 3;0.0 and 4;0.0 and 5;0.0 respectively. During this intensive period, HANNAH was recorded for one hour/day five times a week. During each week one of the recordings is a video. During period ii and period iv, HANNAH was recorded five times a week for one week every month. These periods are therefore less intense and again if possible one of these recordings each week was a video.

This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER program, MOR and POST.

A summary of the frequency of data

Section i (HELEN aged 3-00-01 to 3-01-11) The six-week intensive period
• HELEN is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

Section ii (HELEN aged 3-02-00 to 3-11-10) The one week a month period
• HELEN is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.

Section iii (HELEN aged 4-00-02 to 4-01-13) The six-week intensive period
• HELEN is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

Section iv (HELEN aged 4-02-01 to 4-11-07) The one week a month period
• HELEN is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.

Section v (HELEN aged 5-00-00 to 5-01-19) The six-week intensive period
• HELEN is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

The number of recordings in each section
   Scriptsvideos
Section i 28 5
Section ii 52 9
Section iii 29 6
Section iv 46 9
Section v 28 5
Total18434

Over the two years and six weeks the audio of a total of 184 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the recordings the microphones were positioned around the downstairs of the house, allowing HELEN to move freely during her play whilst still capturing her speech. For 34 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.

All of the audio recordings took place in HELEN’s home where she was engaged in regular play activities with her mother. In most of the video recordings the investigator is also present and is engaged in play with HELEN. The videos were also mainly recorded in HELEN’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.

The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.

Note: Compound nouns

We have used the Oxford English Dictionary (OED) to help us make decisions around compound nouns. Generally, if a word is considered to be a single word in the OED, then the word in this corpus will be a single word. For example, the word pushchair is a single word in the OED and will therefore appear in the corpus as pushchair. In the OED if two words are hyphenated then they will be joined with a + in the corpus. For example, the word candy-floss appears in the OED with a hyphen so will be coded as candy+floss in the corpus.

GINA

This corpus contains the data from a longitudinal naturalistic study of one child over a period of just over two years (3;0.1 to 4;7.29). During the early transcription of the data the pseudo-name of Gina was used for anonymity reasons, therefore all the files in the corpus have the extension of .G. More recently, however, parental permission has been gained to submit the sound and video files to CHILDES and for the child’s true name (Rubi) to remain in the files. We have therefore not engaged in the laborious task of removing the name Rubi and changing it to Gina in the videos and the sound files. To avoid confusion please be aware that when listening to the video or sound files you will hear the true name, Rubi, but the name may have been changed to Gina within the transcript.

In terms of frequency, this dataset is best considered in four sections (Sections i, ii, iii, iv). Section i and Section iii comprise a six-week intensive period at age 3;0.0 and 4;0.0 During this intensive period, RUBI was recorded for one hour/day five times a week. During each week one of the recordings is a video. During period ii and period iv RUBI was recorded five times a week for one week every month. These periods are therefore less intense and if possible one of these recordings each week was a video. Unfortunately in period iv due to family circumstances the dataset is fairly scant

This data was originally transcribed using an early version of CLAN; however the data has been completely updated to adhere to the 2013 transcription guidelines and to be compatible and pass CHECK in CLAN 2014. The CHAT files have been linked to the audio files (WAV) and the files annotated to match the 2014 MOR library. The files have also been processed through the 2014 CHATTER program, MOR and POST.

A summary of the frequency of data

Section i (GINA aged 3-00-01 to 3-01-11) The six-week intensive period
• GINA is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

Section ii (GINA aged 3-02-00 to 3-11-06) The one week a month period
• GINA is recorded for one hour, one week in every month. During this week there are five recordings one of which is a video.

Section iii (GINA aged 4-00-02 to 4-01-11) The six-week intensive period
• GINA is recorded for one hour, five times a week, every week for the entire period. One of each of the five recordings is a video.

Section iv (GINA aged 4-02-29 to 4-07-29) The one week a month period
• GINA is recorded for one hour, one week in every month. During this week there should be five recordings and one video but due to family circumstances it was difficult

The number of recordings in each section
   Scriptsvideos
Section i 30 6
Section ii 40 8
Section iii 29 5
Section iv 19 4
Total11823

A detailed inventory of the data and dates can be found in Appendix 1. Over the two year period the audio of a total of 118 sessions were recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. For the recordings the microphones were positioned around the downstairs of the house, allowing GINA to move freely during her play whilst still capturing her speech. For 23 of these recordings a video recording was also gained using a standard video-camera. These videos are now in MPEG4 format and permission has been granted for these to be submitted to CHILDES. They are however password protected and permission must be granted via Brian Macwhinney.

All of the audio recordings took place in GINA’s home where she was engaged in regular play activities with her mother. In most of the video recordings the investigator is also present and is engaged in play with GINA. The videos were also mainly recorded in GINA’s home, although a number were recorded in the laboratory at the Max Planck Child Study Centre at the University of Manchester. The recordings are 60 minutes long, unless there was a problem with recording on that day.

The data was transcribed over a number of years and by a number of different transcribers. However, the whole process was overseen by one research investigator who ran regular reliability checks across the data and across transcribers. This same investigator also trained each of the transcribers. Although the data was originally transcribed using an older version of CHAT, CLAN and MOR it has since been updated to adhere to CLAN 2014 and the MOR LIBRARY 2014.

Acknowledgements

Funding was supplied by these sources: The Department of Comparative and Developmental Psychology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.