CHILDES English OCSC Corpus


Laura Wagner
Psychology
Ohio State University

Sharifa Alghowinhem
Media Lab
M.I.T.

Abeer Alwan
Electrical and Computer Engineeering
UCLA

Kristina Bowdrie
Speech and Hearing Science
Ohio State University

Cynthia Breazeal
Media Lab
M.I.T.

Cynthia G. Clopper
Linguistics
Ohio State University

Eric Fosler-Lussier
Computer Science and Engineering
Ohio State University

Izabella A. Jamsek
Speech and Hearing Science
Ohio State University

Devan Lander
Speech and Hearing Science
Ohio State University

Rajiv Ramnath
Computer Science and Engineering
Ohio State University

Jory Ross
Linguistics
Ohio State University

Participants: 303
Type of Study: naturalistic
Location: USA
Media type: audio
DOI: doi:10.21415/60TV-D669

Browsable transcripts

Download transcripts

Link to media folder

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus was collected in the Language Sciences Research Lab, a working lab embedded inside of a science museum: the Center of Science and Industry in Columbus, Ohio, USA. Participants were recruited from the floor of the museum and run in a semi-public space. Three distinctive features of the corpus are: (1) an interactive social robot (specifically, a Jibo robot) was present and participated in the sessions for roughly half the children; (2) all children were recorded with a lapel mic generating high quality audio (available through CHILDES), as well as a distal table mic generating low quality audio (available on request) to facilitate strong tests of automated speech processing on the data; and (3) the data were collected in the peri-pandemic period, beginning in the summer of 2021 just after COVID-19 restrictions were being eased and ending in the summer of 2022 – thus providing a snapshot of language development in a distinctive time of the world. A YouTube video on the Jibo robot is available here .

Demographics

A total of 303 children contributed data, ranging in age from 4 to 9 years old. The OCSC Demographics spreadsheet provides information on their age, gender, race/dthnicity, language background, parents’ educational level, familiarity with robots, reading environment, the noise level during testing, length of the testing session, and the specific role played by the Jibo robot. It also notes the month and year in which they were tested. To summarize briefly, these children are mostly monolingual English speakers (91%), the majority are White (82%), they come from a highly educated background (79% had at least one parent who had earned a Bachelor’s degree or higher and 50% had two parents who did so), and approximately half are female (54%). A brief overview of the number of children in each age folder along with their mean age (and range) is shown in Table 1.

Tasks

All children participated in the same set of tasks administered in the same order, with the following exceptions: (1) the Intro to Robot task was only administered when the Jibo robot was in fact present; and (2) children who could not read were not shown reading passages. In addition, children were never pressured to finish any tasks. As a result, some tasks were cut short (most notably – children who had difficulties reading often did not complete the Reading Passage task) and some children quit the protocol before getting to the final tasks, particularly the Descriptive Pictures task. At the beginning of each task (and at the end of the session), children pressed one of four noise-making buzzers.

Within each transcript, tasks are marked with Gem codes (@G) as specified below. In addition, when social chit-chat at the end of the session was included, it is marked with the code @G: EndTasks. In addition, we have included on this site three cut documents for special words found in our transcripts: (1) OCSC_lofreq includes low frequency words that were used by children in this task; (2) OCSC_wugs includes the nonsense words used in the Wug task; (3) OCSC_comm includes our specific “communicator” conventions, used for transcribing children sounding out letters in the alphabet task.

Intro to Robot (no gem marking)

Children were introduced to the Jibo robot. Jibo greeted them and typically asked them what their favorite color was.

Alphabet (@G: Alphabet)

Children were shown 26 cards (in alphabetical order), each containing a capital letter, a word starting with that letter and a picture of the word. Children were asked to name the letter and the picture, and to think of another word that started with the same letter. Children who were having difficulties with letter naming were not pressured to name the picture or provide another word that started with the same letter.

Numbers (@G: Numbers)

Children were shown 28 cards showing the numbers from 1 – 15 in ascending order followed by 20 and 25, 30, 40, 50, 60, 70, 80, 90, 100, 200, 500, 1000 and were asked to identify each number. Next came a series of four arithmetic problems to solve featuring addition, subtraction, multiplication, and division. Younger children who could not do the math were encouraged to label the numbers and/or the mathematical symbols in the problems.

Wug Task (@G: Wug)

Children were run through a version of the classic Wug task (Berko, 1958) focusing exclusively on plural morphology. Children were asked to produce the plural forms for 10 common words (e.g. bag, light) and 10 rhyming nonsense words (e.g. yag, pite). Each word was supported with amusing pictures. Children heard the items in one of 8 pseudorandomized lists.

Experimental Pictures (@G: ExpPictures)

Children were shown four complex pictures, each of which contained many different elements that could be described and connected in interesting ways. For example, one picture showed a girl wearing a lab coat and goggles holding a lightbulb aloft in a messy lab that included images of a DNA helix and chemical structures, as well as a giant octopus holding a magnifying glass.

This task was labeled as “Experimental” because it was administered in four different ways to allow us to compare the potential effects of the Jibo robot on children’s speech. Children run without the Jibo robot present were in the Robot Absent condition. When Jibo was present, he could engage in different ways. In the Encouragement condition, Jibo praised the child during this task (“good job”) just as he did in the rest of the tasks; in the Instruction condition, Jibo not only provided encouragement but also provided the initial instructions for this task (“Tell me what’s on the picture”); in the Presents Images condition, we took advantage of a special feature of Jibo and actually presented the pictures on his face-screen. When presenting pictures in this way, Jibo also provided instructions and encouragement. We note that not all children responded equally enthusiastically to Jibo and thus the experimenter also provided instructions to the child as needed in order to get speech content for this task. Further, due to a variety of ongoing minor technical issues, we do not have a systematically counterbalanced set of children in each experimental condition.

Reading Passage (@G: Reading)

Children were presented with an 8.5” X 11” card with a short written passage drawn from a set aimed at first and second grade reading levels. There were 60 total passages which were rotated among the children. Children who showed good reading skills and enthusiasm for the task read up to four passages.

How To Task (@G: Howto)

Children were presented with 16 small cards, each depicting a child engaged in a familiar task, such as washing their hands, getting dressed, frosting cupcakes, etc. They were asked to identify the action and to explain how to do it.

Descriptive Pictures (@G: DescriptivePictures)

Children were presented with 10 whimsical pictures, such as a frog in a teacup or a giraffe eating a sandwich. They were asked to describe the items in each picture.

Funding Acknowledgements

The creation of this corpus was funded by grants from the Ohio Department of Higher Education and the National Science Foundation (#IIS-2008043 and #SMA-2146474).