CHILDES Hebrew Berman Longitudinal Corpus

Ruth Berman
Department of Linguistics
Tel Aviv University


Participants: 4
Type of Study: naturalistic, longitudinal
Location: Israel
Media type: audio
DOI: doi:10.21415/T5X61W

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Below are the publications, research studies, and talks using the database.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This Hebrew longitudinal data-base is contributed by the Tel Aviv University Laboratory headed by Ruth A. Berman, holder of the chair “Language across the Life Span”. Funding for data-collection and transcription of these materials was provided by grants to Ruth Berman, Tel Aviv University, and Jürgen Weissenborn, Max-Planck Institute for Psycholinguistics, Nijmegen, from the German-Israel Binational Science Foundation (GIF) – 1988 to 1991 – and from the Deutsche Forschungsgemeinschaft (DFG) – 1988 to 1990 – for the crosslinguistic study of early language acquisition in French, German, and Hebrew. Additional assistance with funding and equipment was provided by Brian MacWhinney as director of the CHILDES Laboratory at Carnegie Mellon University, and by Wolfgang Klein, director of the Language Acquisition section at the Max-Planck Institute for Psycholinguistics.

Sharon Armon-Lotem supervised data collection by graduate student research assistants of the Department of Linguistics and School of Education, Tel Aviv University. Sigal Uziel-Karl and Bracha Nir-Sagiv standardized the files, following the latest version of the CHILDES transcription system (MacWhinney 2000).

The data-base consists of naturalistic longitudinal data collected on a weekly basis from four Hebrew-speaking children, three girls (Hagar, Smadar, and Lior) and one boy (Leor). All four children are native speakers of Hebrew raised in monolingual, highly educated Hebrew-speaking homes, with both parents professionals, in urban communities of central Israel. Smadar was the youngest of three girls, Hagar and Leor were only children at the time of recording, and Lior had a baby brother.

Each child was audio-recorded at his or her home for a total of around one hour per week, typically two or three times a week in different situations (mealtime, bath time, playing on their own or with siblings or parents and grandparents). Recordings were done over a period of one to three years (see Table 1 below). The contact person and main recorder for three of the children was the mother, and in one case (Leor’s) the aunt – all four native speakers of Hebrew that had majored in linguistics at the university. The research assistants kept close touch with the family contact person, and caretaker-recorders were encouraged to maintain a natural and spontaneous atmosphere throughout recording situations, but they were also instructed to repeat or extend what the child had said in cases where an utterance might be unclear or unintelligible to transcribers. Those doing the recording were also instructed to specify the exact situation in which recording took place at the outset and in the course of each session. Information about the situation in specific sessions is provided in each file under the @Situation heading. As former students of the department and/or research assistants in associated projects, all caretaker-recorders and parents agreed to giving over the materials for further use by the Berman lab.

This data-base has several features that make it well-suited to child language research. The interactions are natural since they were recorded in the homes, a setting familiar to the children, in the presence of a primary caregiver and / or other members of the family. The data were collected over several sessions each week and so allowed a variety of contexts for the children to express themselves. Rich contextual information was provided by the caregivers, and the latter were regularly available to the transcriber for consulting and clarifications. Finally, both the transcribers and the researchers involved in the project knew the children and their parents, and were familiar with the children’s linguistic development beyond the data provided by the recorded sessions.

Table 1 gives details of the complete data-base recorded and transcribed for the four children.

Table 1 - Size and Range of Database from Four Hebrew-speaking Children
ID SubjectSexAge Range#Files #Child UttsUtts RangeMean Utts Per File
LONGIT HagarF1;7 - 3;3 1351523831 - 288 113
LONGITLiorF1;5 - 3;1 141 21920missing149
LONGITSmadarF1;4 - 2;4 34 714444 - 374 210
LONGITLeorM 1;9 - 3;0 82 16470 62- 423 198

The transcripts were all transcribed in the CHAT format (CHILDES) with adaptations to Hebrew. A special system of broad phonetic transcription of Hebrew devised by Ruth Berman in earlier studies (and used in an earlier, cross-sectional Hebrew data-base of 100 children between ages one to five years, entered on the CHILDES archives in the late 1980s) was improved and extended for use with the audio-recordings in the longitudinal study. The transcription applied to the longitudinal database made it possible to represent in a consistently standardized way the children’s target forms; that is, how they would be pronounced in the standard Hebrew speech of these children’s caretakers. This procedure was adopted in order to facilitate lexical searches across the same forms for the same words. This also meant, however, that the resulting transcriptions are suited to analysis at the levels of morphology (inflectional and derivational) as well as syntax and the lexicon, but are not adequate for details of phonological development.

The children’s target forms are typical of “standard” Hebrew usage of well-educated Israelis for whom Hebrew is a first and major language (Berman 1987, Ravid 1995, Berman & Ravid 1999). In order to reflect the genuine usage of such speakers (and the primary input to the children in this research), the transcription deliberately departs from both the historical or underlying forms represented by conventional Hebrew orthography and from the normative pronunciation stipulated by the Hebrew language establishment (Hebrew Language Academy, school grammars, official broadcasting and media, etc).

Children’s actual pronunciation of certain forms and pronunciation errors were marked as such on the main tier. For example, when a child used a form like nanu for gamarnu ‘finished-1pl-pt = alldone’, this was represented on the main tier (text-line) as nanu [: gamarnu] [*]. More general comments regarding child and adult pronunciation were included under the heading @Comment. This procedure was adopted (1) to allow for lexical searches, since Hebrew orthography represents vowels and phonological processes such as spirantization and voicing assimilation only very partially; (2) to facilitate analysis of data based on situational context or on caretaker reaction prior to coding (for example, whether a form such as pes ‘climb’ should be taken to mean letapes ‘to climb’ or metapes ‘climb-ms-sg-pr’ (Cf. Armon-Lotem & Berman 2003, Uziel-Karl 2001); (3) to make the contents of the transcripts more readable and so more accessible to outside investigators and students. As a result, as noted, the materials are well suited to analysis at the morphological, lexical, and syntactic levels, but do not allow for detailed phonological analysis. Further, the efforts to ensure rich contextual information by means of cues provided by the adults who did the recording make the material available to semantic and pragmatic analyses as well.

Transcription conventions

Letter SymbolExampleGlossLetterSymbolExampleGloss
Zayinzzeit, thisTsadecmocecpacifier


Usage restrictions

Note: Copies of publications using this data-base should be sent by e-mail to and/or by air mail to Dr. Ruth Berman, Department of Linguistics, Tel Aviv University, Ramat Aviv, Israel 69978.

For more information regarding the data-base, contact Bracha Nir-Sagiv or Sigal Uziel-Karl at