CHILDES Egyptian Arabic Salama Corpus

Heba Salama
Department of Phonetics and Linguistics
Alexandria University


Participants: 10
Type of Study: single session
Location: Egypt
Media type: audio, linked
DOI: doi:10.21415/78CE-VW65

Browsable transcripts

Download transcripts


Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description


The Egyptian Arabic corpus includes data from ten children. Five boys and five girls were selected randomly with no language delay from a nursery in Alexandria. All children were normal and their first language is Arabic. The children ranged in age from 1;7 to 3;8 years (mean age2.77) studied cross-sectionally. 7 Children were visited in their kindergarten and 3 children at their home. The total number of utterances for all 10 children is 25,645. The adult part of the corpus contains 14,868 adult utterances 2,518 from the mother, 12,350 from the investigator and, 10,777 from children..

Data Collection

A speech sample was collected based on spontaneous speech in unstructured interview. Data were elicited through conversation, naming objects, pictures around the child in his/her environment and use anything that children normally use rather than something new, and describe what they were doing while playing. We encourage natural interaction to include all styles, such as sitting with a child in the class, playing with the child, interacting with a mother, and/or teacher during the interaction and teaching process. The interview is increasingly semi-structured when a child is able to produce morphemes: for example, when a child produces a singular noun, the investigator, and/or a mother asked him about plural competence. Data was collected from 6 children in a nursery by the investigator, one child at home with a mother and an investigator, and 2 children at home by mothers. Audio recording of spontaneous speech produced by children is obtained in natural settings, in a child home or a kindergarten.


The materials used by an investigator and/ or a mother to facilitate spontaneous speech production are toys, objects, naming pictures, comics or just a talk without any specific topic. A child must recognize the materials used. The investigator and mother used anything that children normally use rather than using something new.

The Recording of the Corpus

Recording took place in a quiet room. The presumed time of recording for each child is 30 minutes and a total of approximately five hours. In young children under two years, the length of interaction varies. The recordings were done at intervals because young children were easily frustrated and much moveable, and asked for many things during the recording time, such as going to the toilet or eating. Only one boy 19 months had his recordings done by his mother at home and the investigator continued recordings at the nursery. Another young girl 2;2 had her first fifteen minutes recorded by her mother, and then the other fifteen minutes were done at the investigator's home. Recording held by using high quality recorder Sony/WM-GX322 and high quality tapes for seven children and three children directly recorded digitally through phones. Each child was informed that he or/she would record his /her speech. All children were happy with the recording of their speech; they played with the cassette and/or phone, and waited to hear their voices over the recorder. After recording, the children’s audio files were saved on the computer. Transcripts of the recorded child speech were made later by the investigator. The total number of transcribed words is 25,645.