CHILDES Egyptian Arabic Salama Corpus
Department of Phonetics and Linguistics
| Participants: || 10 |
| Type of Study: || single session |
| Location: || Egypt |
| Media type: || audio, linked |
| DOI: || doi:10.21415/78CE-VW65 |
Salama, H., & Alansary, S (2018). A Morphological Analyzed Corpus for
Egyptian Child Language. The Eighteenth Conference on Language
Engineering . In Egyptian Society of Language Engineering.
Salama, H., & Alansary, S. (2017). Lexical Growth in Egyptian Arabic
Speaking Children: A corpus Based Study. The Egyptian Journal of
Language Engineering, 4(1), 29-34.
Salama, H., & Alansary, S. (2016). Building a POS-Annotated Corpus
For Egyptian Children. The Egyptian Journal of Language Engineering,
Salama, H., & Alansary, S. (2014). Building a spoken Arabic corpus
for Egyptian children. The fourteenth Conference on Language
Engineering. In Egyptian Society of Language Engineering.
Heba, Salama. (2015). Building a spoken Arabic corpus for Egyptian
children: data collection and transcription. Master's thesis. Alexandria
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Egyptian Arabic corpus includes data from ten children. Five boys and five girls were selected
randomly with no language delay from a nursery in Alexandria. All
children were normal and their first language is Arabic. The children
ranged in age from 1;7 to 3;8 years (mean age2.77) studied
cross-sectionally. 7 Children were visited in their kindergarten and 3
children at their home. The total number of utterances for all 10
children is 25,645. The adult part of the corpus contains 14,868 adult
utterances 2,518 from the mother, 12,350 from the investigator and,
10,777 from children..
A speech sample was collected based on spontaneous speech in
unstructured interview. Data were elicited through conversation, naming
objects, pictures around the child in his/her environment and use
anything that children normally use rather than something new, and
describe what they were doing while playing. We encourage natural
interaction to include all styles, such as sitting with a child in the
class, playing with the child, interacting with a mother, and/or teacher
during the interaction and teaching process. The interview is
increasingly semi-structured when a child is able to produce morphemes:
for example, when a child produces a singular noun, the investigator,
and/or a mother asked him about plural competence. Data was collected
from 6 children in a nursery by the investigator, one child at home with
a mother and an investigator, and 2 children at home by mothers. Audio
recording of spontaneous speech produced by children is obtained in
natural settings, in a child home or a kindergarten.
The materials used by an investigator and/ or a mother to facilitate
spontaneous speech production are toys, objects, naming pictures,
comics or just a talk without any specific topic. A child must
recognize the materials used. The investigator and mother used anything
that children normally use rather than using something new.
The Recording of the Corpus
Recording took place in a quiet room. The presumed time of recording
for each child is 30 minutes and a total of approximately five hours.
In young children under two years, the length of interaction varies.
The recordings were done at intervals because young children were
easily frustrated and much moveable, and asked for many things during
the recording time, such as going to the toilet or eating. Only one boy
19 months had his recordings done by his mother at home and the
investigator continued recordings at the nursery. Another young girl
2;2 had her first fifteen minutes recorded by her mother, and then the
other fifteen minutes were done at the investigator's home. Recording
held by using high quality recorder Sony/WM-GX322 and high quality
tapes for seven children and three children directly recorded digitally
through phones. Each child was informed that he or/she would record his
/her speech. All children were happy with the recording of their
speech; they played with the cassette and/or phone, and waited to hear
their voices over the recorder. After recording, the children’s audio
files were saved on the computer. Transcripts of the recorded child
speech were made later by the investigator. The total number of
transcribed words is 25,645.