Jeannine Goh MPI Child Study Centre University of Manchester jeannine.goh@manchester.ac.uk website |
Participants: | 1 |
Type of Study: | longitudinal, naturalistic |
Location: | England |
Media type: | audio |
DOI: | doi:10.21415/T5JG64 |
Publications using these data should cite:
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics, 20, 3, 481-508.
Other publications based on the use of these data include:
Maslen, R., Theakston, A., Lieven, E. & Tomasello, M. (2004). A Dense Corpus Study of Past Tense and Plural Overregularization in English. Journal of Speech, Language and Hearing Research, 47, 1319-1333
Dąbrowska, E. & Lieven, E. (2005). Towards a lexically specific grammar of children’s question constructions. Cognitive Linguistics, 16, 3, 437-474.
Lieven, E. (2006). Producing multiword utterances. In B. Kelly & E. Clark (eds.) Constructions in Acquisition. Stanford, CA: CSLI Publications, pps. 83-110.
Cameron-Faulkner, T., Lieven, E. & Theakston, A. (2007). What part of no do children not understand? A usage-based account of multiword negation, Journal of Child Language, 34, 251-282.
Chang, F., Lieven, E., & Tomasello, M. (2008). Automatic evaluation of syntactic learners in typologically-different languages. Cognitive Systems Research, 9 (3), 198-213
Bannard, C. & Lieven, E.. (2009). Repetition and Reuse in Child Language Learning In Roberta Corrigan, Edith Moravcsik, Hamid Ouali, Kathleen Wheatley (eds.). Formulaic Language: Volume II: Acquisition, Loss, Psychological reality, Functional Explanations. Amsterdam: John Benjamins (pps.297-321).
Bannard, C. & Matthews, D. (2008). Stored word sequences in language learning: The effect of familiarity on children's repetition of four-wrod sequences. Psychological Science, 19 (3), 241-248
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics,20, 3, 481-508.
Ph.D. dissertations (largely based on these data): Cameron-Faulkner, Maslen, Kiravainen
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
This corpus contains the data from a longitudinal naturalistic study of one child over a period of three years. The child is called Thomas. He was born 03-APR-1997 into a middle class family. His primary care-giver is his mother. This large dataset is best considered in three sections (Sections A, B, C). Section A differs from B and C in the frequency of recordings, and section C differ from A and B in its use of an updated transcription and morphosyntactic coding system. More details of these differences are given below.
THE FREQUENCY OF DATA
Section A (Thomas aged 2-00-12 to 3-02-12) A VERY INTENSIVE PERIOD
Section B (Thomas aged 3-03-02 to 3-11-06) AN INTENSIVE PERIOD
Section C (Thomas aged 4-00-02 to 4-11-20) AN INTENSIVE PERIOD
Procedure Over the three year period the audio of a total of 379 sessions was recorded using a standard Sony mini-disc recorder and Sennheiser evolution radio microphones. The microphones were positioned around the downstairs of the house, allowing Thomas to move freely during his play whilst still capturing his speech. For 73 of these recordings a video recording was also taken using a standard video-camera. These videos are now in DVD format but permission was not gained for submission to the CHILDES database. All of the audio recordings took place in Thomas’s home where he was engaged in normal play activities with his mother. In most of the video recordings the investigator is also present and is engaged in play with Thomas. The videos were mainly recorded in Thomas’s home, although a number were recorded in the laboratory at the Max Planck child study centre at the University of Manchester. Most of the recordings are 60 minutes long.
Known inconsistencies in the data
The corpus was gathered over a number of years during which time CLAN was updated, the experience of the transcribers increased, transcribers came and went, and problems were identified and rectified along the way. This has inevitably led to some inconsistencies in transcription some of which are listed below.
Missing auxiliary: | Mummy 0is [*] come-ing |
Overextension: | brokened [*] |
Omissions: | David 0and [*] Sharon |
Mummy-0’s [*] watch | |
Lots of train-0s [*] | |
Confusions: |
More Notes on transcription
Phonological forms: The focus in this study is early
grammatical development and not specific phonological forms that Thomas
uses. Therefore, unless Thomas uses what appears to be child-specific
forms, the target word is transcribed rather than an approximation of
the child’s phonological form.
Thomas’s
early language
Error Coding
Errors that are coded during transcription are as follows (APP 3:
Error coding more guidelines)
Missing morphemes | e.g. ‘two dog-0s’, ‘He’s go-0ing’ , ‘Mummy-0’s sock’ etc. |
Case errors | e.g. ‘Her do it’, ‘Me get it’ |
Missing or incorrect auxiliaries and copulas | e.g. ‘It 0is going there’, ‘I 0am getting a drink’, |
Word Class Errors | e.g. double determiners ‘a that one’, |
Agreement errors | e.g. ‘a bricks’, ‘these penguin’, ‘Does she likes it?’, ‘It don’t go there’. |
Pronominal Errors | e.g. ‘Carry you’ when the child wants to be carried |
Wrong word | e.g. ‘I put it off’ - where the context indicates ‘take’ is appropriate. |
Overgeneralisation | e.g. ‘it broke-ed’ |
Not all errors are easy to identify. In utterances such as the following “what doing trucks” it’s difficult to pinpoint the type of error that has been made. In such cases an error marker [*] is placed on the main tier and a question mark in the error line
When to use an error code
An error code should be used whenever what the child says is grammatically incorrect. If there is something wrong with the sentence, you as the transcriber, need to flag it up using the [*] sign. You should place the [*] sign straight after the word that is the problem. If we do not flag up the errors then the researcher may not know what the child intended to say, for example:
*CHI: me Mummy stopped
You may know from hearing the transcript if the Mummy has stopped or if the child has stopped, or if the Mummy has stopped the child. Maybe whether there is an omitted has or had. These are all useful things for the researcher to know.
If you know there is an error but there is ambiguity surrounding it then it is best to use a [?] on the error line. You can use angular brackets to show it is the whole sentence or some words in the sentence that you are unsure about
*CHI: Omitted/missing words
These are generally transcribed correctly but to revise. An ‘O’ is
used to indicate that there is a word omitted and that you have
indicated what it is by preceding it with the 0. Commonly words like
have and has (auxilaries) are often omitted or even parts of words, for
example:
CHI: I 0have [*] got CHI: I am go0ing CHI: I want two sweet0s [*] What is said after the ‘0’ is taken out when we run the grammar
program and what is left behind should read exactly what the child
actually said. Anything after the 0 is what you have corrected.
Additions and overextensions
The following is VERY important, if the child has wrongly added an
‘ed’ ending on a word it should be coded like this:
*CHI: threwed [*] it . If in the next example you are sure that they mean one sweet:
CHI: I want a sweets [*] If you are not sure if it was one sweet:
%err: [?]
%err: 0have=have
%err: go0ing=going
%err: sweet0s=sweets
%err: threwed = threw .
%err: sweets=sweet