Krupa-Kwiatkowska Bilingual Corpus

Magda Krupa-Kwiatkowska
Neuropsychology Laboratory
San Diego State University


Participants: 1
Type of Study: naturalistic
Location: USA, Poland
Media type: no longer available
DOI: doi:10.21415/T5NC7D

Project Description

This directory contains data from a longitudinal, ethnographic case study of a Polish boy learning English as a second language. The study examined selected aspects of lan-guage acquisition within the context of the child’s socialization in a new culture and microscopic observation as the technique of data collection was therefore considered crucial. These data were collected and transcribed by M. Krupa-Kwiatkowska, from her son, Martin.

They are selected from a 2-year observational record, which started in September 1992, 1 month after the boy first arrived in the United States, and ended in August 1994, during his first summer vacation in Poland. Twenty-two of the 31 sessions videotaped in the United States were transcribed and included in this corpus. Sessions recorded in Poland are still to be transcribed.

Target child & other children

When the observations began, Martin was 6 years and 2 months old and had just arrived in the United States, following his mother, who was there studying towards her doctoral degree in second language education. Two weeks after his arrival in the United States, Martin began to attend first grade in a Buffalo, New York, elementary school. His prior education included 4 years of kindergarten and preschool in Poland, where he was born and brought up until then. Except for sporadic instruction in English at the age of 5, when he was taught approximately 50 to 100 words and short phrases, this was his first encounter with the English language and American culture. Martin was the only child and was raised by his mother. He was born on July 7, 1986.

The other children invited to participate in the play sessions with Martin were selected in such a way as to reflect a plethora of combinations of personal and cultural features that could be potentially accountable for different patterns of interactional behavior. The original criterion for contrasting the boy’s behavior in peer interactions was the availability of language as a medium of interaction. Therefore, three kinds of situations were recorded: (a) the boy’s interactions with American children who spoke only English, (b) the boy’s interactions with Polish-American children who spoke both languages, and (c) the boy’s interactions with a Latin-American girl who spoke Spanish and little English, with very few shared linguistic resources. The data in this corpus include sessions with six children, who were Martin’s most frequent playmates at that time: Basia, Sarah, Scott, Justin, Robert, and Gabi.


The observational sessions were typically held two or three times a month for the time of 1 hour. Most of them took place at Martin’s house and a few at the house of the other child. The children were free to play wherever and whatever they wanted. They could move from one room to another and change the play topic according to their wishes. Although an attempt was made to conduct two to three hourly sessions monthly, the session schedule was not forced, but was dependent upon the boy’s social calendar and followed the events as they occurred. These restrictions often limited the possibility of recording interactional events. Adult intervention was avoided unless asked for or necessary for safety reasons. A tripod was used to avoid adult’s presence when it was not necessary. However, because of high mobility of the children, this was not often possible. All the sessions were videotaped by a person considered most neutral to the situation. Usually it was the parent of the hosting child, and because most sessions took place in Martin’s home, it was typically the researcher herself.


Out of these sessions, specimens of about 20 minutes considered to be most “interac-tive” were selected for transcription. Because there were large patches of non-interactive recorded data, such a selection criterion seemed most natural and justified. This material was then transcribed in the CHAT format. The transcripts include a verbatim record of the children’s speech, paying close attention to such conversational features as repetitions, retracings, interruptions, noncompletions, and omissions, and to special forms of linguistic and quasi-linguistic activity, such as word play, syllabification, invented words, and onomatopoeic expressions. Whenever Polish was used, English translation was provided for the utterance. Utterances were transcribed using the standard alphabets of English and Polish, with the omissions of diacritics in Polish, as these were not then available in CHAT. Nonverbal expressions, such as nonsensical or incomprehensible words, whenever pronounced as in English, were transcribed using the UNIBET system. Unless otherwise marked, children’s utterances should be interpreted as being addressed to the other child. Apart from the speech record, incorporated in the transcripts are the nonlinguistic data, involving the record of general activity, paralinguistic behavior, gestures and facial expressions, and other comments. These were transcribed whenever they were judged to constitute a communicative act on the part of the child. The record of paralinguistic behavior includes various kinds of vocalizations, like screams, sighs, groans, laughter, singsong, and the message conveyed by the tone of the voice, when it was apparent.

The following table contains basic information about each file in the corpus:
