Singapore Bilingual Corpus

Yow Wei Quin
Humanities, Arts, and Social Sciences
Singapore University of Technology and Design


Participants: 55
Type of Study: classroom
Location: Singapore
Media type: audio
DOI: doi:10.21415/T53C7F

Citation information

Publications using these data should cite:

We request that a copy of any publications that make use of this corpus be sent to us at the above address.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This work was conducted to investigate the relationship between code-switching frequency and language competency in bilingual preschoolers.

Classroom observation was conducted in two private childcare centres in Singapore for about three hours per day over five different days in June 2013 in Centre E and in August 2013 in Centre M. Using a video recorder and an audio recorder, we recorded children’s self-talk and conversations at various times, such as free play, meal time, group project time (e.g., arts and crafts, group writing activities). Two different groups were recorded at the same time using a different set of equipment. The classrooms had an open-concept, thus, we could hear background voices from other groups of children or other classes at the same time. The transcriptions were mainly based on the video recordings, but cross-checked using the audio recordings, especially when the voices were unintelligible in the video. Each transcription was double checked by a second transcriber. A total of 55 English-Mandarin 5-to-6-year-old children were observed (see more details in Biographical Data section). An additional four children (AMA, CHR, DAM, WAC) were excluded from our subsequent analysis of the data as their attendance is low or because the child only spoke few utterances throughout the whole observation session (e.g., eight utterances). The total duration of recording was 51:26:31 (30:09:48 for Centre E and 21:16:43 for Centre M).

Transcriptions were done according to CHAT conventions (MacWhinney, 2000). Routines (i.e., nursery rhymes, standardized greeting before meal or before lesson starts, songs, and games, such as “scissors-paper-stone”) were marked with “@si” and a postcode [+ rou] was added at the end of the utterances. Ambiguous communicators or interjections that are commonly used in both English and Mandarin, such as uh/哦, ah/啊, oh/噢, Singlish [1] particles (e.g., “meh”, “la”, “na”, see Rubdy, 2007), and onomatopoeia (imitation of sounds, e.g., “woof woof”) were marked as non-words in both English and Mandarin contexts in our transcriptions, for example, &aiyo@i.

[1] Singlish, also known as Singapore Colloquial English, is a creolized form of English spoken in Singapore (Platt, 1975).

We created a list of additional proper nouns because Mandarin lexicon database does not recognize proper nouns (see file preschoolsg-prop.cut). We also created an additional list of lexicons of words that are not recognized in the current lexicon database, for example, “durian” (name of a local fruit) and 名卡 (ming2ka3 which means business card, but in the People’s Republic of China, it is commonly known as 名片(ming2pian4); see preschoolsg-add-eng.cut and preschoolsg-add-zho.cut.)

The first letter of the file name of the transcripts indicate the corresponding childcare centre (e.g., e1d1a is a transcript of Centre E, and ma1d1a is the transcript of Centre M). There are a total of 79 transcripts of Centre M and 102 transcripts of Centre E. At least two coders, trained on our criteria listed below, reviewed each section of the transcripts to ensure reliability and consistency of the coding.

At Centre E the boys were Alan, Augustine, Eddie, Edward, Ivan, James, Jared, Lewis, Lucas, Martin, Melvin, Tai_Lin, and Titus. The girls were Calista, Claudia, Kendra, Kiera, Krista, Lea, Sandra, Stacey, and Tiffany. At Centre M, the boys were Elmer, Eugene, Irvin, Mia_Jun, Johhn y, Kevin, Larry, Bo_Min, Nicholas, Patrick, Darrell, Tommy, Tony, David, Davin, Yu_liang, and Shu_Wei. The girls were Brenda, Daisy, Guan_Zhi Jaslyn, Molly, Naomi, Sally, Selena, Shirley, Cassie, Shu_Ling, Jun_Xin, Xiu_Juan, Shi_Ting, and Yu_Shan.

Biographical data

Singapore is a multilingual nation with English as the official language and three other languages as the official mother tongues (Mandarin, Malay, Tamil). Majority of its population is ethnic Chinese (74.26%), while the rest are Malay (13.35%), Indian (9.12%), and Others (3.17%) (Singapore Department of Statistics, 2014). Among children aged between 5;0 and 9;0 (50.5%), English is the most frequently spoken language at home in families, followed by Mandarin (28.3%), Malay language (13.1%), Indian languages (5.8%), and Others (2.2%) (Singapore Department of Statistics, 2010). In the Singapore education system, all children have to learn English and a mother tongue according to their ethnicity, but all other subjects are conducted in English (Gopinathan, 1999). Therefore, English is the common language amongst all Singaporean children, while children also need to be able to speak, read, and write fluently in one of the mother tongue languages when they reach formal school age. In both the childcare centres that we observed, one English teacher and one Mandarin teacher were present in each class. The two private childcare centres are located in two different middle-class neighbourhoods, one located in the west and the other in the northeast area of Singapore.

Of the 55 English-Mandarin 5-to-6-year-old children observed, 30 of them were male and 25 were female (see participant list in the preschoolsg-child.txt). The parents’ average highest level of education attained was a college degree (3.98 out of 5, where 0 = no formal education and 5 = postgraduate degree). The children were reported to have an average exposure of 55.30% English and 41.80% Mandarin at home, with the remaining exposure in various Chinese dialects (e.g., Cantonese, Hokkien), or other Asian languages (e.g., a child was reported to have an exposure of 2% Japanese).

Notes & warnings

These transcriptions are not intended for analysis of teachers’ input or teaching methods. Most of the teachers’ utterances were not transcribed. Overlaps in children’s conversations were also not marked.

As it was impossible to carry out observation study in the classroom if any of the parents refused to give consent, this study was approved by IRB with waiver of consent forms. Video files are not available for public but audio files, with all names mentioned have been muted, are available in TalkBank. Our transcripts are using fuller pseudonymization, so both confidentiality and anonymity are preserved.


Postcodes usedDefinitionExample
Code-switching measures [+ intra] the use of two language elements in a sentence (Genesee, Nicoladis, & Paradis, 1995) *EXA: [- zho] 我要 test@s:eng 这个. [+ intra]
[+ inter] a sentence in one language is followed by a sentence in another language (Genesee, et al., 1995), either immediately or after a gap.
Note: Turn-taking between interlocutors was not counted as an inter-sentential switch (e.g., child A speaks Mandarin and child B replies in English).
*EXA: no, yes, my mommy said no table manners cannot sit with us.

*EXA: [- zho] 你要改去坐在我们桌子上吗? [+ inter]
[+ inter-utter-switch] a sentence in one language is followed by an intra-sentential switch, or vice versa, either immediately or after a gap. *EXA: 我@s:zho 要@s:zho the short piece.

*EXA: I need a short piece, any color. [+ inter-utter-switch]
[+ intraoth] the use of two language elements in a sentence other than in English and Mandarin.
*This only occurred once in the file e6d5b.cha
*EXA: you_know Ah_boys_to_men so funny, lobang@s:ind xxx the sergeant then [/] then xxx say, welldone then xxx go xxx. [+ intraoth]
Exclusion from code-switching measures [+ rou] Routinized forms such as standardized greetings before meals, songs, rhymes, or readings. *EXA: [- zho] 老师@si 请@si 吃@si, 小朋友们@si 请@si 吃@si, 大家@si 一起@si 吃饭@si 了@si, 谢谢@si Aunty_Carmen@si . [+ rou]
[+ prop] Utterances that only contain proper noun only. *EXA: Claudia. [+ prop]
[+ prop-intra] The use of proper noun of other language in a sentence. This is not considered as an intra-sentential switch. *EXA: 我 叫 Iron_Man 帮 Ben10.
[+ imit] Imitation of others’ utterances. *TIF: [- zho] Aunty 是 木瓜吗?
*AUN: [- zho] 蜜瓜.
*TIF: [- zho] 蜜瓜. [+ imit]
[+ trans] Translation from one language to another language *EXA: 多@s:zho means a_lot. [+ trans]



This work was partially supported by the SUTD SRG under Grant SRG HASS 2011 011 and the Singapore-MIT International Design Centre (IDC) under Grant IDG31100106 and IDD41100104 to Dr Yow Wei Quin.