Department of Linguistics
University of Basel
|Type of Study:||case study|
|Media type:||password to audio|
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The boy’s language development was recorded from age 1;11.13, the onset of multiword speech, up to age 4;11. Between the ages von 1;11.13 and 3:0, daily parental diaries were kept to note the 10-30 most innovative and complex utterances of the child. Diary notes were spoken into a small dictaphone at the time and place of the action to avoid misrepresentation by having to memorize them. The caretakers typed up the utterances plus contextual information in CHAT-format in the evening. In the transcripts, all diary notes have the code [- diary].
Between 1;11.12 and 1;11.29, several test recordings were made to test the equipment and the procedure and are of varying length. The main study started at 2;0. Between 2;0.00 and 2;11.29 the daily diary notes were supplemented by five 60-minute recordings each week. Once a week, the session was also video-taped. Between age 3;0 and 4;11, there were five audio recordings per week every 4th week.
After 2;0.00 all recordings are of 60 minute length with very few exceptions due to the child not feeling well. Often, the caregivers split up the session into two segments, e.g., taping half an hour in the morning and the other half in the afternoon, because keeping a conversation or play session going for 60 minutes proved to be quite exhausting for child and parents, or sometimes other activities or the demands of other family members intervened. Between 2;6.00 and 2;6.11 there was a malfunctioning of the recording equipment which was only noted when we started to transcribe the tapes. Therefore, there are diary data only with the exception of le020608.cha, which was transcribed from the video.
The sessions were recorded with a Sony Minidisc recorder MZ-R35 using two wireless and portable Shure BG4.1 Unidirectional Condenser Microphones, and a Shure ETPD-NB Marcad Diversity Receiver. All recordings took place in the family home or hotel, when the family was on holiday. Since the microphones were wireless, they could be placed wherever the family wanted, the only request was to avoid background music or the neighbourhood of washing machines, blenders and other noisy gadgets. With this setup, the family had full control over the situations they wanted to tape and was also given the right to withhold tapes they considered too private. They never made use of this possibility but delivered a 60-minute recording every day.
In order to facilitate data retrieval and coding, the word stems were transcribed in standard orthography to avoid having the same word in more than one orthographic form. The CHAT conventions were used to be as faithful to reductions or alternative pronunciations of the word stem as possible.
@o was used to code not only onomatopoeics, but interjections and other discourse markers as well as forms the meaning of which could not be inferred and therefore did not qualify as a child- or family-form. As a result, searches for forms excluding -@o forms result in standard vocabulary.
@c was used for child forms include short forms made up by the child as well as names he invented for places and people or animals. They are marked with @c when there is a risk of confusion with existing words, e.g.: Einfach@c 'simple' or Doppel@c 'double' are standardly used as shortcuts for naming a regular ('simple') bus and a doubledecker bus.
Other unique word forms which occur frequently are not marked:
Eichi 'squirrel' ( = stuffed animal)
echen / achen = dummy nonce words that Leo uses in all kind of circumstances, e.g. if he cannot or does not want to answer a question.
@f was used for family forms represent such as nicknames for the children as well as made-up adjectives and manner words as forms of word play.
@d was used for dialect words such as Luelle@d 'saliva' and luellen@d 'drool', 'slaver'
@t was used for test words such as glorpen@t, tammen@t, dotzen@t, seiken@t, Bral@t, Muhne@t
Common compounds like "Apfelbaum" 'apple tree' were transcribed as a single word, new compounds or very long and hard-to-parse units were transcribed with "+". e.g., Baby+Giraffe, Mama+Auto, Arno+Bett, Mini+Lokomotive, Super+Zug
"+" was also used to link names ("Tante+Ida"; "Rasender+Roland" (name of a train on the island of Ruegen') or fixed phrases and interjections (ach+du+lieber+Gott@o 'oh+my+God@o'), titles of books or songs (Winnie+der+Baer, Stille+Nacht), acronyms (L+K+W), complex numbers (neunzehnhundert+vierzehn). Consequently, the "+" sign cannot be taken as an indicator of noun-compounds, but rather serves to unite sequences of words that should be treated as one constituent in syntactic analyse. Care was taken that each combination of words is represented in just one form, but there may be variation with the same stem ("Babysachen" but "Baby+Teile").
Because Leo showed different forms of disfluencies and went through phases of onset stammering where it took him several attempts to finally produce the word or utterance he wanted, extra conventions had to be established to depict these phenomena while not inflating the lexical counts by transcribing the same element several times.
[MA] was introduced as a scoped symbol and stands for multiple attempts of producing a word or phrase &=vocalizes indicates that a sequence of mumbling preceded the articulation of the utterance. This way of representing disfluencies was preferred over xxx because in most cases Leo succeeded to produce an intelligible utterance in the end. These utterances will not have to be discarded from analyses because they have unintelligible elements in them.