Fernández Fuertes / Liceras Bilingual Corpus

Raquel Fernández Fuertes
Department of English Philology
University of Valladolid


Juana Liceras
Department of Linguistics
University of Ottawa


Participants: 2
Type of Study: naturalistic, longitudinal
Location: Spain
Media type: audio
DOI: doi:10.21415/T52S3R

Citation information

Publications using these data should cite:

Project Description

This corpus contains spontaneous productions from a longitudinal study of two English/Spanish bilingual identical twins with the pseudonyms of Simon and Leo. They were born 28-DEC-1998 into a middle-class family in Spain. The father is a native speaker of Peninsular Spanish and the mother is a native speaker of American English. The father always speaks to the children in Spanish and the mother always addresses them in English. The parents generally communicate in Spanish with each other, except on summers when they travel to the United States for approximately two months or when a monolingual English speaker is present. Therefore, we are dealing with bilingual English/Spanish first language acquisition in a monolingual-Spanish social context, a type of bilingualism that is referred to in the literature as individual bilingualism (Bhatia and Ritchie, 2004).

During the first year, the mother was the primary caretaker of the twins. The father was present all day on weekends and less on weekdays. At age 1;10, the twins started going to day care for 3 hours a day on weekdays, where the language of the staff and other children was Spanish. Apart from the mother, additional contact with English was provided by occasional visits by the maternal grandparents and during the two-month visits to the United States every summer.

The twins and participants other than the investigators have pseudonyms to protect their privacy. If names of children or participants other than the investigators appear in the recordings, only the first initial or first and second initial are transcribed. When it was not clear in the recording which of the twins was speaking (mainly because they were off- screen) we have used SOL (Simon or Leo) instead of S (Simon) or L (Leo).


The data we have collected cover the age range of 1;01 to 6;11. A total of 178 sessions were recorded on videotape and DVD, of which 117 are in an English context (i.e., with an English interlocutor such as the interviewer or their mother) and 61 in a Spanish context (i.e., with a Spanish interlocutor such as the interviewer or their father). The Spanish recordings were made at intervals of 2-3 weeks until age 3;00 (with some interruptions during the summer holidays), and then once a month after that. The English recordings were sometimes made more frequently, but the sessions are usually much shorter and recorded on consecutive days. The children were recorded in naturalistic settings, usually at home, and appear together in the majority of the sessions. They were mostly engaged in normal play activities with the interlocutor.

Videotaping was done by Raquel Fernández Fuertes, K. Todd Spradlin, and Esther Álvarez de la Fuente. Transcription in Valladolid was done by Esther Álvarez de la Fuente, Susana Muñiz Fernández, Isabel Parrado Román, K. Todd Spradlin, Elisa Rosado Villegas, Israel de la Fuente Velasco, and Alfonso Martínez Pérez. Transcription in Ottawa was done by Rocío Pérez-Tattam, Tamara Vardomskaya, Anahí Alba de la Fuente, K. Todd Spradlin, Marco Llamazares, Melissa Grimes, Shauna Flynn, and Deidre Butters.

Inventory of files

Data transcripts appear in three folders (English, Spanish and bilingual). The first two correspond to the recordings made in English and in Spanish respectively. The bilingual folder includes recordings in which we have used different experimental tests that involved a combination of English and Spanish: 3 files involve code switching and 5 involve natural translation/interpreting.

A full inventory of files in the FerFuLice corpus appears in this Excel file . A sheet per folder appears in the Excel (English, Spanish and bilingual). Each sheet contains information relative to the file, age of the children and duration of the recording. Additionally, several calculations have been performed for each file and for each child: number of utterances and number of words produced, MLU value measured in words, MLU rate per month and MLU rate per year. The adult input that appears in the recordings and transcripts has been quantified, as well. An estimate of the children’s primary input in English and in Spanish in terms of number of utterances and number of words is also provided. Calculations involving the entire corpus appear in the last Excel sheet (totals).


Funding was supplied by these sources: