CHILDES Swedish-Portuguese MCF Corpus

Madalena Cruz-Ferreira
Independent Scholar
madalena@beingmultilingual.com
website

Participants:	3
Type of Study:	naturalistic
Location:	Sweden, Portugal
Media type:	audio
DOI:	doi:10.21415/T52W2D

Citation information

Cruz-Ferreira, M. (2003). Two prosodies, two languages: Infant bilingual strategies in Portuguese and Swedish. Journal of Portuguese Linguistics, 2(1), 45-60. doi: 10.5334/jpl.35

Cruz-Ferreira, M. (2006). Three is a Crowd? Acquiring Portuguese in a Trilingual Environment. Clevedon: Multilingual Matters.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Other references include:

Cruz-Ferreira, M. (1990). Karin and Sofia in Bilingual-Land. In J. Leather & A. James (Eds.), New Sounds 90. Proceedings of the 1990 Amsterdam Symposium on the Acquisition of Second Language Speech (pp. 248–254). Amsterdam: University of Amsterdam Press.

Cruz-Ferreira, M. (1999). Prosodic mixes: Strategies in multilingual language acquisition. International Journal of Bilingualism, 3(1), 1–21.

Cruz-Ferreira, M. (2008). Child multilingualism at home and in school. A comment on David Deterding’s review of Three is a Crowd? (Multilingual Matters, 2006). International Journal of Applied Linguistics, 18(1), 110–113.

Cruz-Ferreira, M. (2014). O corpus CHILDES MCF: Primeiras produções de três crianças multilíngues. In L. Scliar-Cabral (Ed.), O Português na Plataforma CHILDES (pp. 111–134). Florianópolis: Insular.

Project Description

This corpus contains longitudinal and cross-sectional data from three children, two girls and one boy, primary bilinguals in Portuguese and Swedish, who acquired English as the language of schooling.

The children

Karin, Sofia and Mikael are siblings, from an upper middle-class family background. The father is a native speaker of (Central Standard) Swedish and the mother, who is also the researcher and a trained phonetician, is a native speaker of (European) Portuguese.

Karin and Sofia were born in Sweden, in September 1986 and July 1988, respectively, Mikael was born in Portugal in October 1990. From birth, the children have been exposed to Portuguese and Swedish according to the one-person, one-language principle that the parents adhere to since then. The parents are otherwise fluent in one another’s language as well as in English. In all exchanges between the children and Portuguese or Swedish relatives and friends the one-person, one-language principle is easily maintained. The children have been exposed to several accents of Swedish and Portuguese, the latter including Brazilian Portuguese.

Due to the father’s professional commitments, the family has had several moves to different countries since the children’s birth. A schematic indication follows, in order to highlight the extent of the children’s exposure to different languages.

July 1986 - two months before Karin’s birth, the parents moved from Denmark (Copenhagen) to the south of Sweden, where the family set up their permanent home. From October 1987 to June 1988, Karin (1;1 to 1;9) attended a local kindergarten, where she spent an average of 15 hours a week.
September 1988 - seven weeks after Sofia’s birth, the family moved to Portugal. From September 1989 to June 1990, Karin (3;0 to 3;9) attended daily kindergarten at the Swedish School, Lisbon area.
November 1990 - three weeks after Mikael’s birth, the family moved to Austria, Vi-enna area. From November 1990 to June 1992, and from September 1991 to June 1992, Karin (4;2 to 5;9) and Sofia (3;1 to 3;11), respectively, attended a local kindergarten. Two months after the start of school, the girls’ teachers reported that the girls were quite comfortable communicating in German. This language is however not part of this corpus.
July 1992 - the mother and the children moved back to Portugal. From August 1992 to May 1993, the father was posted in the USA and traveled to Portugal for short weekend visits on an irregular monthly basis. From September 1992 to June 1993, Karin (6;0 to 6;9) attended grade 1, and Sofia (4;1 to 4;11) attended kindergarten at the Swedish School, Lisbon area.
August 1993 - the family moved to Hong Kong. From September 1993 to June 1994, Karin (7;0 to 7;9) attended grade 2 at a British school. During this period, on the advice of Sofia’s teachers and due to progressive proficiency in English, Sofia was successively upgraded, from a Montessori kindergarten, to reception/grade 1 and then to grade 1 at the same British school in each term of the academic year. Apart from two months of two-hourly tuition per week in English for Karin and Sofia (Karin from 6;8, Sofia from 4;10), when the family had confirmed the coming move to Hong Kong, this move marks the beginning of the children’s regular contact with English. For Mikael, English was also the language of his first school ever, where from November 1993 to June 1994 he (3;1 to 3;8) attended the same Montessori kindergarten as Sofia. At this English language school, both Sofia and Mikael had regular exposure to Cantonese through songs, counting and nursery rhymes.
August 1994 - the family moved to Singapore, where they have lived for nearly 6 years at the time of writing. The children attend English school, Karin from 8;0 in grade 3, Sofia from 6;2 in grade 2 and Mikael from 3;11 in nursery.

When in Europe, the family traveled to Sweden for the summer and to Portugal for Christmas or vice-versa. In Asia, the family travels to both countries for either the summer or Christmas. Before 1993, the children also had irregular exposure to English, through exchanges between the parents and foreign guests to the home, or from social gatherings involving Swedish and Portuguese relatives or friends.

At the age of 6, all three children started attending once weekly Swedish Supply School in the countries where the family has lived, where they learn about the language and the country. The children never had any formal tuition of this type in Portuguese, although they are comfortably familiar with the culture of both Sweden and Portugal.

As far as exposure to other languages than those involved above is concerned, Karin (10;0) and Sofia (9;2) started curricular lessons in Mandarin at school from grade 5. Both girls have Latin at school, and Sofia has French. They are, of course, exposed to the local languages spoken in Singapore, the main ones being Mandarin, as well as other Chinese languages, Malay and Tamil. They are also familiar with different accents of English, in-cluding non-native accents.

Sofia, the latest speaker of all three, was diagnosed at age 4 with 40% deafness due to recurrent middle-ear infections for which she had been receiving regular medication since babyhood. She underwent grommet and adenoid surgery twice, first in Portugal at 4;9 and later in Singapore at 6;2, when the problem was solved. The noteworthy consequence of this problem was that up to the age of 10 her delivery was rather slurred in both Portuguese and Swedish, whereas her delivery in English, which she started learning with normal hearing, was faster and clearer from the very start. Mikael had a lisp, which he spontaneously corrected at age 5;9. The children are otherwise healthy and their development is normal.

The children have always lived with both parents, and always taken active part in the family’s life. The mother is the main caregiver, having stayed at home during the children’s first years. The children are therefore mostly exposed to Portuguese at home. In order to counterbalance this asymmetry, compounded by the regular absences of the father due to business travel, the parents chose to address one another mostly in Swedish in the presence of the children. While consistently using either Portuguese or Swedish in exchanges with each parent, the children started by using Portuguese among themselves, except when recalling or discussing events specifically related to Sweden, like skiing or the midsummer celebration, for which they used Swedish. From the start of their regular schooling in English, they gradually started using more English among themselves, English being now almost exclusively the language of their exchanges. None of the children has ever felt self-conscious about using Portuguese or Swedish with their parents in front of non-speakers of the languages, including other children.

Data collection

Data are being collected, since the birth of each child, through audio recordings, video recordings and diary notes made by the mother.

Audio and video tapes are reviewed soon after recording, and supplemented by diary notes wherever clarification is needed. Otherwise, extensive diary notes are used to record each child’s progress, both linguistic and in other developmental areas. Recordings are typically made whenever a new linguistic trait appears in the children’s speech, in the same way that progress in other areas is noted down in the diaries, that is, on no regular chronological basis. The data in this corpus concern the children’s Portuguese, Swedish and, from 1993, their English. Most of the data reflect spontaneous speech, except in cases where the child was specifically asked to speak (or sing, or read) ‘for the record’, for example, to say the colour or animal names in a picture book.

Typical recording sessions took place, in the first months of the children’s life, with the child safely lying down and playing on its own or interfacing with one parent or relative. Later, the tape recorder was turned on in an inconspicuous place where the children were busying themselves or being attended to. The children were obviously aware of the camera during video recordings, but its presence soon became an uninteresting detail of their routine. Recordings encompass a broad spectrum of situations. Aside from the recordings made to capture specific progress, which were usually made at home, recordings include daily routines, solitary play or with other children, festive gatherings with family and friends, and outings. The data therefore give a broad view of each child’s full (socio)linguistic ability, including making acquaintance with adults and children, voice modulations and strategies to call the attention of distant hearers, or strategies to overcome background noise. For the recordings of spontaneous interaction with children outside of the family, parental permission to use the data was duly requested and obtained.

One possible shortcoming of the recorded data is that the mother was regularly present during collection, except in those cases when the tape recorder was left on with the children on their own. Other shortcomings of spontaneous child speech collection are well-known to researchers in this area, from the children’s unwillingness to cooperate, to disruptions from siblings or equipment during recording of one particular child. The detail included in the diaries therefore constitutes an invaluable complementary resource.

Transcription and coding

Data were transcribed and coded by the researcher, who is competent in all three lan-guages. Transcription was made as soon as possible after recording, and rechecked when coding into CHAT format, from January 2000.

All files in the corpus include a %pho: tier and a %int: tier. Both tiers are also used to transcribe adult utterances with characteristic features of child-directed speech, or other-wise non-standard.

The %pho: tier.

Font - IPAPhon. A narrow transcription is attempted, while compromising with readability. Babbled strings are transcribed in full, with problematic sounds discussed in the %com: tier. The %mod: tier gives colloquial forms, as spoken in the family.
Symbols - adult speech, and child speech that can safely be recognised as (renderings of a) target, is transcribed according to the conventions in the Handbook of the International Phonetic Association (Cambridge University Press, 1999) for each language. In transcriptions of babble or otherwise unintelligible speech, the symbols used represent standard International Phonetic Alphabet values. For example, the IPA [_] symbol represents a vowel with similar vowel quality to one mid central vowel found in both Portuguese and Swedish. In target-like child forms, this vowel is transcribed with [_] in Portuguese and with [_] in Swedish; in babble, only the symbol [_] is used.
Diphthongs - vowel sequences are taken as diphthongs if the second vowel follows the tone initiated in the first. The glide segment of the diphthong is transcribed with [j] or [w], which therefore represent vocoids. Hence, e.g., [aw] represents one syllable, [au] represents two.
Obstruents - voiced symbols that are marked devoiced, e.g., [__], indicate voiceless lenis articulations.
Syllables – for the purposes of stress assignment, intervocalic consonant sequences are syllabified as onsets - according to the phonotactics of the language involved in the case of adult and target-like child forms. This is one choice among many possible, and does not imply sanctioning one type of syllabification in child speech. Two adjacent identical vowel symbols indicate that the child pronounced the vowel as two syllables.
Stress - pitch obtrusion usually makes it clear which syllable is being stressed. Other cues to stress are duration and intensity at the syllabic peak. Stress is marked with [_] before the affected syllable.
Words - a space delimits what was interpreted as a word or a phrase within the same tone group, in child or child-directed speech, even when not corresponding to these constituents in target forms.

The %int: tier.
This tier transcribes uses of pitch, adapting the principles of nuclear notation described in the CHAT Manual, and includes indication of voice quality and paralinguistic features, e.g., creak, tempo.

Adult speech and target-like child speech is transcribed by means of abbreviated paired symbols. In simple falling, rising or level tones, the first symbol denotes the high, mid or low pitch at which the tone starts, and the second symbol denotes the type of pitch movement, falling, rising or level. The one exception is the Portuguese extra-low fall, see below. ‘High’, ‘mid’ and ‘low’ are relative terms: a ‘mid’ pitch level denotes the speaker’s average tone range, as it is impressionistically detected in regular contact with any speaker, ‘high’ and ‘low’ being accordingly defined in relation to ‘mid’ for each speaker. In complex tones, the successive symbols indicate the type of pitch movement: The conventions are as follows:

Simple falls:

LF - low-fall
MF - mid-fall
HF - high-fall
eLF – extra-low fall, from low to below the speaker’s usual low range.

Simple rises:

LR – low-rise
MR – mid-rise
HR – high-rise

Level tones:

LL – low-level
ML – mid-level
HL – high-level

Complex tones:

RF – rise-fall
FR – fall-rise
RFR – rise-fall-rise
FRF – fall-rise-fall.

Complementary indication of where the pitch ends is added where relevant, e.g., “HF to mid”
Other conventions are:

preH - prehead: unstressed syllables before initial stressed syllable
H - head: from the first stressed syllable in the utterance up to the nuclear syllable.

These symbols always follow symbols indicating pitch start or type, so that confusion between the H denoting ‘high’ and the H denoting ‘head’ is avoided. Examples of their use are:

LpreH – low prehead
MpreH – mid prehead
HpreH – high prehead
LH – low head
MH – mid head
HH – high head
FH – falling head
RFH – rising-falling head

Transcription of each tone group (tg) is given on successive lines of the %int: tier. Prehead, head and tone are separated by + signs in the transcription, e.g. (file ptgsw.K880500, lines 865 and 867-869):
*DAD: vad heter //det för nåt # vad är //det för nåt # vad //heter det.
%int: 1tg, MH+ML; 2tg, MH+LR; 3tg, LH+MF.

In babbled speech, no assumption is made concerning the existence of an intonational nucleus. Transcription of babble concerns pitch height and movement on each babbled syllable, according to similar conventions. The main difference is that + signs here indicate syllable boundaries, e.g. (file ptgsw.M901215, lines 77-81):
*MIK: yyy.
%int: 1tg, ML+long MF; 2tg, LL; 3tg, HL+short LF.

Other conventions and symbols

Orthography - adult utterances, and children’s utterances recognised as (renderings of) target forms, are given in standard orthography. A form of ad-hoc ‘baby orthography’ is also used for child connected speech that, although replicating target utterances, distorts segments and prosody beyond any readable use of CHAT conventions for truncated child utterances. In these cases, standard orthography is given in the %gls: tier. It is hoped that ‘baby orthography’ will be easily understandable by native users of the database. One example is in file ptgsw.SM910100, lines 24-25:
*SOF: a/mã # k/lhi klh/kó?
%gls: mamã, a Karin está na escola?

Ptg, Sw, Eng - indicate quotation of data in Portuguese, Swedish and English, respectively, in %com, %exp or %lan tiers. Notations of the type PtgEng are used in the same way for multilingual mixes, with the first symbol indicating host language (the language accepting an intrusion) and the second guest language (the intruding language). In the %lan tier, the use of one language symbol on its own indicates a probable rendition of a target in the language.

tg(s) – tone group(s)
syll(s) – syllable(s)
dipt(s) – diphthong(s)
min(s) – minute(s)
sec(s) – second(s)
Other abbreviations, such as Det, VP, follow accepted standards.

The files contain monolingual and/or mixed production by one or more of the children. The filenames include a language prefix, the child(ren)’s initial(s) and the date of recording, given as yymmdd. An indication of 00 for the day means that the exact day of recording is unknown. Files containing all three languages are prefixed ptswen.