English Language and Linguistics
University of Hertfordshire, UK
|Type of Study:||longitudinal, naturalistic|
Link to media folder
Lonngren-Sampaio, C. (2015) The investigation of code-switching in a computerised corpus of child bilingual language. Unpublished doctoral dissertation, University of Hertfordshire. https://doi.org/10.18745/th.16360
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by the above reference.
The LOBILL (LOnngren BILingual Language) Corpus is longitudinal in nature and is composed of the spoken language of two bilingual children in their interactions with mono and bilingual interlocutors, in diverse family situations. The main subjects of the corpus are MEG and JAM, a sister and brother who were 5;10 and 3;5 years old at the beginning of data collection. Both were born in Fortaleza, Brazil and attended a Brazilian school from the age of 1;6 until they moved to England in 2004 when they were 8;7 and 6;3 respectively. Their mother (identified as MOT), who is the researcher, is English and married to a Brazilian, (identified as PAI), who speaks English fluently. MOT is a near-native speaker of Portuguese, having studied the language at university and lived in Brazil for twelve years before returning to England in 2004. The bilingual siblings’ language experience can be divided into two major phases which correspond to before and after moving to England. Before the birth of their children, Portuguese formed the basis on which all daily interaction between MOT and PAI took place, although code-switching did take place. From the birth of MEG in 1995 the family language dynamics changed: MOT spoke exclusively English to her daughter while PAI used Portuguese when addressing his daughter. This daily use of English at home led to greater use of English between the parents, mostly in their code-switching practices. This pattern was further consolidated when JAM was born in 1998, MOT continuing to speak English to both siblings while PAI interacted with them mostly in Portuguese. Other daily inputs of English were restricted to television programmes (Cartoon Network and Discovery Channel) and English story books (read by the mother). Occasional visits from English relatives provided another important source of contact with English and most years both children spent short periods on holiday in England with their mother, where they stayed with their English Grandmother (1996, 1998, 2000 and 2003).
Despite the mother’s use of English to both children, the interaction between the siblings was predominantly in Portuguese, following the model of interaction experienced with their peers at their Brazilian school. Whilst in England on holiday, there was more use of English between the siblings, especially when in the presence of English cousins. When the family moved to England in June 2004, MEG was 8;7 and JAM was 6;3. MEG had been reading and writing in Portuguese for 2 years while JAM had only just learnt to read and write in Portuguese. Although MEG was able to read in English, her written English showed clear influence from Portuguese. JAM was able to read some English but there was no evidence that he was able (or unable) to write English words. Immediately after moving to England in June 2004, the mother, MEG and JAM stayed with the children’s grandmother (GRA) and their auntie (BEC). Their father, PAI, was due to arrive in August, two months later. They began primary school three days after arriving and thus both at home and at school they were immersed in English. For the next two months the children’s only source of Portuguese were their interactions with each other and telephone calls to their father in Brazil. With the arrival of their father at the end of August and a move into a family home of their own, Portuguese again began to feature in their interactions at home on a daily basis. As described below, the transcriptions contained in the corpus cover the period up to the December after the family's arrival in England.
*MEG: mas@s [//] the water is very very cold ? [+ pe]
In such code-switched utterances, postcodes (here [+ pe]) were
used to code the direction and number of switches within each utterance.
By using 'p' to represent a Portuguese word or sequence of words in
Portuguese and 'e' for the English equivalent, the postcode can contain
any number of ps and es and therefore can cover any number of switches
which may occur within one utterance. The error code [*] was used to
mark any perceived errors and, where possible, further information was
included on the %err dependent line.
This work was supported in part by grants VARIAD (FF12012-35058)
and Contact (FFI2016-75082) from the Spanish Ministry of Education to
Dr. Aurora Bel Gaya.
A detailed description of how the corpus was analysed both
quantitatively and qualitatively can be found in the doctoral
dissertation cited below. However, it is important to note that the
language coding used to code the corpus in 2015 differs to that used in
the current version (2022). This means that the command lines, which can
be found in the footnotes of the dissertation, can no longer be
In such code-switched utterances, postcodes (here [+ pe]) were used to code the direction and number of switches within each utterance. By using 'p' to represent a Portuguese word or sequence of words in Portuguese and 'e' for the English equivalent, the postcode can contain any number of ps and es and therefore can cover any number of switches which may occur within one utterance. The error code [*] was used to mark any perceived errors and, where possible, further information was included on the %err dependent line.
This work was supported in part by grants VARIAD (FF12012-35058) and Contact (FFI2016-75082) from the Spanish Ministry of Education to Dr. Aurora Bel Gaya.