CHILDES Indonesian Jakarta Corpus


David Gil
Department of Linguistics
Max Planck Institute for Evolutionary Anthropology


Uri Tadmor
De Gruyter Mouton, Boston

Participants: 9
Type of Study: naturalistic
Location: Indonesia
Media type: audio
DOI: doi:10.21415/T5MS43

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Articles based on the use of this corpus should cite the following source:

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The MPI-EVA Jakarta Child Language Project was a joint project of the Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, and the Center for Language and Culture Studies, Atma Jaya Catholic University. The project was officially started in January 1999, and recordings and data processing began in April 1999. The goal of the project was to record, transcribe, and enter into a computerized database a corpus of naturalistic data from a large sample of Jakarta Indonesian child language. A total of eight children were studied longitudinally over the course of five years. The children's ages at their first recordings ranged from 1:7 to 4:6, and each child was recorded at average intervals of 7-10 days over a period of 2-4 years. In addition, data relating to each age group was used for latitudinal studies. Recordings were made at various settings, mostly indoors but also outdoors. Specific situational descriptions are contained in each file.

The target children were chosen for practical reasons. We chose families with which our research assistants were already familiar and where we were reasonably confident that we would be able to maintain regular recording sessions for several years. However, the families are fairly representative of Jakarta’s population, belonging to different socioeconomic strata and different ethnic groups. Our sine qua non condition was that the target children were acquiring Jakarta Indonesian as their first language, and that the major home language of all families (used between parents and children as well as among children) was Jakarta Indonesian. However, other members of the household (mostly grandparents) sometimes spoke languages other than Indonesian, as is the case in most Indonesian families.

Data collection and processing was carried out by graduates of language-related departments of Indonesian universities who underwent a stringent selection process. The successful candidates received training in field methods, phonetic transcription, morphological analysis, and data entry.

The data collection and processing followed a regular routine. On a weekly basis our research assistants took a camcorder to the field and recorded the target children in their (the children’s) homes. The aim was to record natural language in a natural setting. Other than the research assistants and the target children, participants sometimes included parents, siblings, grandparents, friends, and others. The assistants then returned to the Field Station and captured the video recordings to digital video files that were then burnt to CDs. The digital video files were made in PAL format (MPEG-1, 352 x 288 pixels, 25 fps). This allowed us to fit about one hour of video onto a regular 650MB data CD. The CDs were then viewed and coded by the research assistants, each assistant working on the sessions that he or she recorded. Coding was done directly into our customized FileMaker database software.

Each utterance comprises a single record in the database. Each record consists of five fields: transcription using conventional orthography (of any recorded utterance, uttered by anyone); phonetic transcription; interlinear glossing; English translation; and comments specific to the particular utterance regarding linguistic matters as well as the nonlinguistic context.

Our research supervisors checked a large random sample of the coded files, consisting of about 20% of the total, to ensure data integrity and consistency of the data processing methods.

The personal names used in the corpus were not replaced by pseudonyms. Names and nicknames are frequently used in argument positions in Indonesian (where speakers of English, for example, would use pronouns), and in fact comprise about 10% of the total data in the corpus. Altering them would have significantly distorted the data. Moreover, personal names are subject to special morphological processes (e.g. various types of truncation and nickname derivations), and this important linguistic information would have been lost had the names been replaced by pseudonyms. It should also be noted that the names of participants, when mentioned, comprise single names and nicknames, not complete names. Moreover, using personal names in the context of linguistic data citation does not violate Indonesian legal, academic, or cultural norms. However, when quoting from the database users who so wish may substitute names with codes or pseudonyms, as long as this is clearly noted.

Transcription conventions

Jakarta Indonesian is not commonly used in print, Standard Indonesian being used instead. However, it is often used in advertisements, billboards, short text messages, email chats, and personal letters. Some newspapers also use it for writing headlines. Although the orthography is far from standardized, it is roughly based on the spelling of Standard Indonesian, with a few additional conventions. Most characters are used with their IPA values, with a few exceptions:
GraphDescriptionIPA
nypalatal nasalɲ
ngvelar nasalŋ
jvoiced palatal stopɟ
sypalatal fricativeç
emid central vowelə
efront central vowele, ɛ

Note that in the conventional spelling of Jakarta Indonesian (as well as in Standard Indonesian), the mid central vowel and the front central vowel are not distinguished, even though they constitute separate morphemes. Moreover, in Jakarta Indonesian glottals are not spelled consistently. The glottal stop is sometimes unwritten, sometimes it is represented by an apostrophe, sometimes by q, and sometimes (in final word position) by k. The glottal fricative can be represented by h or (rarely) by kh; sometimes it is not written, and sometimes an h is written even though no glottal phoneme is present (based on the orthography of the cognate in Standard Indonesian).

Punctuation marks and codes used in the transcription line include:

Interlinear glosses

Each Indonesian word has a single gloss equivalent in the glossing line. The gloss contains as many morphemes as are analyzed in the Indonesian form, separated by hyphens. Lexical morphemes are generally translated into English. If a single Indonesian morpheme has an equivalent consisting of more than one English word, the words are separated by a period; for example Indonesian adik is glossed as ‘younger.sibling’. For glossing grammatical morphemes (affixes, function words, and reduplication patterns) three approaches were used. If there was an unambiguous English equivalent it was used as the gloss. For example, Indonesian ke was glossed as ‘to’, and Indonesian ini was glossed as ‘this’. If the morpheme could be easily described by a grammatical term, an abbreviation of that term was used. For example the negator tidak was glossed as NEG, and the relativizer yang was glossed as REL.

The following grammatical abbreviations were used:
AbbreviationMeaning
1first person
2second person
3 third person
AGTagent, agentive
COMPcomplementizer
CONTRcontrastive
EPITepithet
EXCLexclamation
FILLfiller
FUTfuture
IMITimitative (inc. nonlexicalized onomatopoeia and interjections)
LOClocative
MUT.REDmutative reduplication (a special type of full reduplication where some of the phonemes of the second element undergo mutation)
NEGnegator
OATHoath
PERSperson marker
PFCTperfect
PLplural
POSSpossessive
REDreduplication
RELrelativizer
SGsingular
TOPtopic marker
TRUtruncation

However, for a number of function morphemes (basically affixes, clitics, and particles), we were unable to settle on a single uncontroversial and agreed-upon gloss that would provide a clear indication of its function. In some cases this was because the form in question has a variety of seemingly different functions, in other cases because the form has been analyzed in different ways by different scholars, and in yet other cases for both of the above reasons. These forms are accordingly glossed with an upper-case replication of the forms' conventional spelling.

For the benefit of users unfamiliar with Jakarta Indonesian, some further information on these forms is provided in the following five tables, covering separate words, prefixes, suffixes, circumfixes, and complex discontinuous morphemes respectively. In the tables below, the first column shows the form as it is appears in the interlinear gloss, the second column provides one or more recommended glosses representing some aspects of its function(s), while the third column presents a very concise description of its function(s). These descriptions are just rough and ready suggestions as to the nature of the forms in question, invitations to the user to come up with more explicit analyses, based on the data in this corpus. Please note that most exclamations are simply glossed as EXCL.

Separate Words
GlossSuggested abbreviation(s)Description
AHHORTHortative particle, typically used to express speaker's intention to perform activity
AYOCOHRTCohortative particle, inviting interlocutor(s) to join in performing an activity. Sometimes also used as an exhortative particle, urging others to perform activity
DAHPFV, PFCTAspect marker, ranging in usage from perfective to perfect
DEHIMP, CONC Pragmatic particle with a variety of functions, including completion, command, and concession.
DENGCORRParticle expressing self-correction; variant of DING
DIHIMPImperative particle.
DINGCORRParticle expressing self-correction
DOGEVCBLVocable (meaningless expression used in singing) [Toba Batak]
DONGIMP, EMPHPragmatic particle with a variety of functions including softening (in imperatives) and 'of course' (in declaratives)
EHCORRExclamation expressing self-correction
GIHIMPStrong imperative particle
HAYOEXHRTVariant of AYO used to challenge interlocutor, e.g. to perform a risky activity or to provide an answer to a riddle.
KAHQQuestion particle, used primarily to express polar interrogatives [Standard Indonesian]
KANQ, EMPHReduced form of the negative marker bukan, used as a question particle, usually to form tag questions, and as an emphatic particle (‘you know’) preceding emphasized phrase
KEKASSOC:DISJ, INDFParticle with several functions, including associative disjunction ('or things like that'), and indefinites (in construction with content interrogatives)
KOKFOC, CONTR, 'how come'Contrastive focus particle, which in initial position acquires interrogative force to mean 'how come'
LAHIMPER, CONC, FOCPragmatic particle expressing a variety of meanings, including imperative, concessive, and contrastive focus
MAHTOPPragmatic particle typically occurring between a contrastive topic and a following comment
MARICOHORTCohortative particle, inviting interlocutor to join in performing an activity. Also used as a polite imperative and in ritualistic leave-taking (‘goodbye’).
NAHPRES, EXCLPresentative particle, often used to introduce the main point of an argument. Also an exclamation to express satisfaction, especially at interlocutor’s understanding (‘That’s it!’, ‘You got it!’) and after completing a task (‘There we/you go!’).
PANQ, EMPHForms tag questions or emphasizes following phrase, much like KAN [Betawi]
PUNFOCFocus particle, with functions similar to 'also', 'even' and others, also used to form indefinites (in construction with content interrogatives)
SIHFOC, EXPLPragmatic particle with a variety of functions. In declaratives occurs after the topic to denote contrastive focus and clause finally to mark explanations. In interrogatives, requests clarification or repetition of previously provided information.
SOK ‘presumptuously’Usually precedes adjectives and means 'presuming to be something (conveyed by that adjective) that one is actually not'
TEHFOCFocus particle with uses similar to SIH [Sundanese]
TOQ, EMPHTag (‘right?’) or emphatic, used much like KAN by speakers of Javanese background.
TOHQ, EMPHVariant of TO

Prefixes
GlossSuggested abbreviation(s)Description
BA-DEPAT-, MED-Voice marker, sometimes analyzed as a depatientive or middle voice marker [Papuan Malay]
BE-DEPAT-, MED-Voice marker, sometimes analyzed as a depatientive or middle voice marker
BER-DEPAT-, MED-Voice marker, sometimes analyzed as a depatientive or middle voice marker
DI-PAT-, PASS-Patient-oriented voice marker, sometimes analyzed as a passive voice marker
KE-DEAG-, INVOL-Voice marker, sometimes analyzed as a deagentive or passive voice marker; depending on stem, it can also mark involuntary activity
MA-AG-, ACT-Actor-oriented voice marker, sometimes analyzed as an active voice marker
MEN-AG-, ACT-Actor-oriented voice marker, sometimes analyzed as an active voice marker [Standard Indonesian]
N-AG-, ACT-Actor-oriented voice marker, sometimes analyzed as an active voice marker
PE-AG-, HAB-, INSTRDerives agentive, habituative, or instrumental nouns from intransitive verbs
PEN-AG-, HAB-, INSTR-Derives agentive, habituative, or instrumental nouns from transitive verbs
SE-one-, same-, as-Basic meanings are 'one' ,'same', and ‘as’, also used to derive words with a variety of functions
TA-DEAG-, INVOL-, SUPERL-Voice marker, sometimes analyzed as a deagentive or passive voice marker; depending on stem, it can also mark involuntary activity [Papuan Malay]
TE-DEAG-, INVOL-, SUPERL-Voice marker, sometimes analyzed as a deagentive or passive voice marker; depending on stem, it can also mark involuntary activity or the superlative
TER-DEAG-, INVOL-, SUPERL-Voice marker, sometimes analyzed as a deagentive or passive voice marker; depending on stem, it can also mark involuntary activity or the superlative

Suffixes
GlossSuggested abbreviation(s)Description
-AN-NOUN, -COMPR, -RECPDerivational suffix with a variety of seemingly unrelated meanings, including deriving nouns, comparative (for adjectives), reciprocal (for some verbs), and many others
-E-3, -POSS, -3:POSS, -ASSOC, -DEFMarker of a range of functions from possessive (usually third person) through to definiteness; may be analyzed as expressing a generalized relationship of association [Javanese]
-IN-END.POINT, -APPL, -TRANS, -BEN, -CAUSVoice marker, sometimes analyzed as an end-point or applicative voice marker; has a range of functions, including causative, benefactive, and transitivizer
-I-TRForms transitive verbs
-KAN-END.POINT, -APPL, -TRANS, -BEN, -CAUSVoice marker, sometimes analyzed as an end-point or applicative voice marker; has a range of functions, including causative, benefactive, and transitivizer
-NO-APPL.IMPForms imperatives of applicative verbs [Javanese]
-NYA-3, -POSS, -3:POSS, -ASSOC, -DEFMarker of a range of functions from possessive (usually third person) through to definiteness; may be analyzed as expressing a generalized relationship of association
-NYE-3, -POSS, -3:POSS, -ASSOC, -DEFMarker of a range of functions from possessive (usually third person) through to definiteness; may be analyzed as expressing a generalized relationship of association
-O-IMPImperative suffix [Javanese]
-WAN-AG:MSuffix forming masculine agentive nouns
-WATI-AG:FSuffix forming feminine agentive nouns

Circumfixes
GlossSuggested abbreviation(s)Description
KE.ANABST-[root]-CIRC,
ADV-[root]-CIRCDerives abstract nouns and adversative passives
PENG.ANVN-[root]-CIRCDerives verbal nouns
PER.ANNOUN-[root]-CIRCDerives collective and other nouns
PE.ANNOUN-[root]-CIRCDerives collective and other nouns (rare)
SE.NYAADV-[root]-CIRCDerives adverbs
SE.RED.NYAADV-RED-[root]-CIRCDerives ‘superlative’ adverbs (‘as x as possible’)
RED.ANSIMIL-[root]-ANDerives similitudinals (conveying a concept similar but not identical to the meaning of the root)

Biographical data of the eight principal target children

Code Sex Age at
start
Age at
end
Socioeconomic
Status
Ethnic
background
Other languages
spoken at home
RIS Female 1;8 6;1 lower Father Sundanese, Mother Betawi Traditional Betawi
PIT Female 4;4 8;9 middle Father Javanese, Mother Javanese Javanese
IDO Male 3;4 6;5 middle Father Javanese, Mother Sundanese Javanese
HIZ Male 1;7 5;11 upper middle Father Javanese, Mother Manado Javanese
PRI Female 2;7 6;1 upper middle Father Javanese, Mother Chinese None
MIC Male 2;0 3;10 upper middle Father Chinese, Mother Chinese None
LAR Female 2;10 6;4 middle Father: Chinese-Betawi-Javanese, Mother Chinese Javanese (nanny)
TIM Male 1;6 5;0 middle Father Papuan-Dutch, Mother Batak Toba Batak

There are 8 files in the PIT-OPI folder. These files were recorded as part of the PIT project, but PIT herself was not present during the recording. She was unavailable, but the researcher decided to proceed with the recording since her cousin OPI was present and we were also tracing his linguistic development.

Usage Restrictions

Acknowledgements

Funding was provided by the Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.