Sarvasy Nungon Corpus

Hannah Sarvasy
Centre of Excellence
for the Dynamics of Language
Australian National University


Nungon Team (from left)
Stanly Girip, James Jio,
Lyn Ögate, Nathalyne Ögate

Participants: 6
Type of Study: naturalistic
Location: Papua - New Guinea
Media type: audio
DOI: doi:10.21415/T5S388

Browsable transcripts

Download transcripts

Media folder

Citation information

Publications using these data can cite:

Sarvasy, Hannah. 2017. A Grammar of Nungon: A Papuan Language of Northeast New Guinea. Leiden: Brill.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

Nungon language overview

Nungon (yuw) is a Papuan language of the Finisterre branch of the Finisterre-Huon group of northeast Papua New Guinea. Nungon is an umbrella term for the southern, higher-elevation (~1500m above sea level) half of an oval-shaped dialect continuum with the Uruwa River running through the center. In total, there are roughly 1,000 speakers of Nungon, but these are divided among several distinct village-specific dialects: Kotet, Towet, Worin, and Yawan. A further two village-lects, Mup and Sagain, belonging to the southern half of the continuum are labeled by local people as Nuon rather than Nungon. The dialects in the northern half of the continuum are described collectively as belonging to two groups, Yano and Yau. Yau differs lexically, phonologically and grammatically from the Nungon dialects. For instance, Yau lacks the contrastive vowel length of most Nungon dialects, and the Yau Remote Future tense is formed differently than that of Nungon. The Uruwa River dialects share from 62% to 88% of core lexicon (Wegmann 1994 dialect survey).

The Nungon language is vibrant in 2017, with all children growing up in the region save those of non-Nungon teachers working at the government primary school learning Nungon as a first language.

All Uruwa River dialects are written using a practical orthography developed by Summer Institute of Linguistics missionaries Urs and Johanna Wegmann in the 1990s, following foundational work by Doug and Carol Lauver in the late 1980s. These missionaries were based in the Yau dialect area.

Culture, history and geography

The Uruwa River area is remote, with no electricity. There are no roads anywhere near the region; access is by small fixed-wing plane, or by foot (three days for locals to hike to the coastal city of Lae, over the Saruwaged Mountains; one day for a local to hike to the Bismarck Sea coast, through dense rainforest and rough terrain). Mobile phone coverage only expanded into the area in 2015. Mountain springs and waterfalls provide drinking, cooking and bathing water. People of the region traditionally farmed taro, yams, bananas, and cultivated a range of indigenous edible trees and plants in their vast rainforest holdings, which reach 4000m in elevation and continue for roughly fifteen kilometers beyond the villages themselves. Men and women hunted mammals, and men hunted birds, including the dwarf cassowary. People of the region traded handicrafts such as bark-cloth raincapes and bows and arrows for goods such as pottery through connections to the major Vitiaz Strait trading circuit (Harding 1967).

Today, people continue to farm expertly, with ample food year-round. Trade with neighboring communities ceased by the 1980s or so, although familial trade-friend relationships with members of those communities are still recognized. Traditional grass skirts for women and bark-cloth loincloths for men were the main clothing worn in the region through the early 1990s; a man in his twenties recounted how excited he was to receive his first Western-style pair of shorts from local leader Dono Ögate on Dono’s return to the region from working in Lae.

The upper Uruwa River valley is characterized by steep-sloped mountain paths, and little level ground anywhere (fixed-wing planes, for instance, land on a sloping airstrip carved with hand tools into the mountainside over years of community work parties). Nungon speakers’ rainforest holdings are incredibly biodiverse; several days of walking around the forest yielded over 400 distinct Nungon names of different native plants. Children are attuned to plant names early, with children of five-seven observed identifying dozens of tree and herb species.

The first European to sojourn in the Uruwa area was the Swiss missionary Karl Saueracker, whose name is still well-known by local people. Baptisms into Lutheran Christianity began in the 1950s. By the late 1960s, conversion to Lutheranism was probably mostly complete. That decade also saw the establishment in the Yau area of a school that used the Kâte language (Papuan, East Huon) as language of instruction. (Kâte is distantly related to Nungon through the Finisterre-Huon group. Although there are grammatical similarities between the two languages, there is little superficial lexical cognacy between the two.) The first local people to become literate in a language were thus literate in Kâte; early Lutheran church written materials were in Kâte, and the language was used as a lingua franca for church-related parties traveling between the Uruwa area and other language areas.

In the 1970s, some villagers decided to bring Seventh-Day Adventism, which they had encountered in travels beyond the Uruwa area, back to the region. Eventually, almost the entire Towet village community and portions of the Yawan, Kotet and Worin communities adopted Seventh-Day Adventism (SDA). SDA adoption has meant that these communities no longer raise pigs nor grow or consume betelnut, and further that they eschew traditional Nungon singing and dancing styles, including the traditional hourglass drum, uwing, in favor of choral performances of English SDA hymns.


Two elementary schools in the region teach literacy in Nungon first, then transition to Papua New Guinea’s English-based creole Tok Pisin. School beyond grade 2 is largely in Tok Pisin, with speaking of Nungon forbidden from grades 3-8. English is also introduced in primary school. Most teachers of higher grade levels do not speak Nungon. These government schools are relatively new; the lower-grades school was started by Eni Ögate in 1998, and the primary school about ten years later. Most adults in the area above thirty have limited formal education, although there are still a few (now grandparents) who attended the Kâte language school.


Sarvasy (2017) is a full reference grammar of Towet Nungon. Nungon is an agglutinating language with some fusion. Constituent order is verb-final. Grammatical relations are indicated through constituent order to a certain extent, but more precisely through enclitics. There are three number subsystems at play in different parts of the Nungon grammar. There is no grammatical gender. Nouns are usually unmarked for number, but prototypically-human nouns can be marked for number when possessed by a singular possessor. ‘Final verbs’ (see below) inflect for five tenses, two past tenses, present, and two future tenses. There are also two imperative forms: an immediate imperative and a delayed imperative. Verbal inflectional morphemes differ in form according to the morphophonological class of the verb root; there are six of these.

Like many Papuan languages and all those of the Finisterre-Huon group in northeast Papua New Guinea, Nungon features clause chaining. Sequences of related actions are usually expressed in clause chains: one or more non-finite ‘medial’ clauses with verbal predicates that are unmarked for tense or mood, followed by a single ‘final’ clause with a finite verbal predicate marked for tense or mood.

Medial clauses are marked for switch-reference depending on exact co-reference between the subject of a medial clause A and the subject of the following clause B. If the subject of clause A and that of B are exactly the same, the verbal predicate in medial clause A (called a ‘medial verb’) bears no indexation of its own subject, just a dummy suffix -nga or -a attached directly to the verb root. But if the subject of clause B is anticipated to differ from that of A, the medial verb in A inflects to index the person and number of its own subject. This morphology is distinct from subject indexation on ‘final verbs.’

Nungon grammar arguably reflects the terrain in which it and other Finisterre-Huon languages are spoken. Two demonstrative series coexist and intersect in some paradigms; a simple proximal/distal pair coexists with topographic demonstratives that indicate whether their referents are uphill from, at the same level as, or downhill from, the speaker, as well as three degrees of distance away.

Special aspects of child and child-directed speech

Phonology. Towet Nungon child-directed speech optionally replaces word-final /k/ with glottal stops, and /r/ with [j] or [l]. These are also characteristic of Towet child speech. Adult Towet Nungon has no phonemic or phonetic glottal stops and employs a phoneme /r/, realized as a trilled or flapped rhotic. In contrast, the three other Nungon dialects are characterized by either replacement of word-final /k/ with a glottal stop (Worin), use of [l] instead of [r] (Yawan), or both of these (Kotet). Both speakers of the Towet dialect and of these dialects have been said to describe their dialects, especially the ones that use [l] instead of [r], as sounding ‘childish.’

Lexicon.Special lexical items in the Nungon baby talk register can be characterized in the following ways:

  1. onomatopoeia for sound produced by thing (buu ‘airplane,’ meyaöng ‘cat,’ meek ‘sheep’)
  2. phonetically reduced equivalent of adult word (nauk ‘water’ (adult yamuk), dou ‘ghost’ (adult dogu), hoit ‘grab’ (adult honggit))
  3. distinct lexical item (dada sibling (no adult equivalent), dudu ‘genitals’ (adult murong), purik ‘turn’ (adult iwan), dai ‘sleep’ (adult duwo-))
  4. occasionally, the Tok Pisin equivalent (pitsin ‘bird,’ kakaruk ‘chicken’)

Syntax. Nungon child-directed speech with preverbal children and children in the early stages of language acquisition is characterized by the optional expansion of what in adult speech would be a simple inflected verb into a nominalized verb plus verb ‘do.’ In adult speech, ‘s/he drinks water’ is simply yamuk na-ha-k ‘water drink-PRES.SG-3SG.’ But in child-directed speech, this can be optionally expanded into nauk na-k ta-a-k ‘water.BT eat-NMZ do-PRES-3SG,’ literally ‘s/he does drinking wa-wa.’ This may relate to the partial generalization of the nominalized form in early child speech, evident in the Towet Oe transcripts between 2;3 and 3;0.

The Target Children

The corpus here was created through a two-year longitudinal study of five children acquiring Towet Nungon as a first language. The study aimed for a small cohort with staggered ages, with the stipulation that both parents of all children must be of Towet village. As it happens, three of the five children have one parent who is of ‘pure’ Towet ancestry, and one who has one Towet parent and one parent from either Yawan (Abraham and Arisen’s maternal grandmothers) or Worin (Daren’s maternal grandmother). This was deemed acceptable by the research team, since the children are growing up in Towet village surrounded by speakers of Towet Nungon, with only infrequent trips to other villages. Slightly more dialect mixing, most of a phonological nature, can be observed in the Arisen and Daren transcripts than in the Towet Oe and Niumen transcripts.

Towet Oe. Birthday: 26 June 2013; age 2;1 to 4;1 during the study. Fourth child of a mother with incomplete knowledge of Tok Pisin (this is an indicator of both her mother’s level of formal schooling and her minimal sojourns beyond the Uruwa area). All paternal and maternal grandparents of Towet village. Many of Towet Oe’s transcripts include extended interactions with her father as well as with her mother. Towet Oe’s transcripts between 2;1 and 3;0 show that she goes through a stage, peaking around 2;8, in which she optionally uses the nominalized form of the verb in place of a fully-inflected form. This has a parallel in an optional syntactic feature of Nungon child-directed speech used with preverbal children and children in the early stages of language acquisition. This syntactic feature expands what would be a single inflected verb in ADS into a nominalized verb followed by the verb to- ‘to.’ This feature of CDS is far more evident among all adults and older siblings (mother, father, adult interviewers, and older sister) in the Towet Oe transcripts than in the older children’s transcripts.

Niumen. Birthday: 26 September 2012; age 2;10 to 4;10 during the study. Niumen is the only first child in the study and is verbally highly competent even at age 2;10 at the study’s outset. Niumen’s mother is a highly intelligent young Towet woman who is literate in Tok Pisin and some English despite having limited formal schooling. Niumen’s father works in the oil palm plantations of Kimbe, East New Britain, during much of the first year of the study; his mother and the interviewers are the main interlocutors in these transcripts. Comments from his mother show that Niumen also spends time with his paternal grandmother, who was originally from Worin village but has lived in Towet for decades.

Daren. Birthday: 3 March 2012; age 3;5 to 5;5 during the study. Daren is the fourth child of the sister of Niumen’s father, a Towet woman in her late thirties who remembers participating in now-obsolete traditional group hunting practices. Daren’s mother has no formal schooling and less-than-complete mastery of Tok Pisin. As noted above, Daren’s maternal grandmother is originally from Worin village. His father is a Towet man who is working in Kimbe, East New Britain during the first year of the study; Daren’s mother and the interviewers are the main interlocutors in these transcripts. Daren’s transcripts showcase Nungon prompting routines, as prompting is a favored method of his mother in getting Daren to talk.

Arisen. Birthday: 6 December 2011, age 3;8 to 5;8 during the study. Arisen is the third child of a Towet woman with limited competence in Tok Pisin. One of Arisen’s maternal grandparents is from Yawan village. Arisen’s father is a Towet man with a speech impediment. Arisen is highly verbal and produces long, complex clause chains from the beginning of the study.

Abraham. Birthday: 16 May 2015, age 1;2 to 2;3 during the study. Abraham was added nearly half-way through the two-year study to provide more early speech data. Transcription of his recordings is still underway.

Recording and Transcription Methodology

A team of five native speakers of Towet Nungon, three women and two men, ran the month-to-month operation of the study. Lyn Ögate, James Jio, Stanly Girip, Nathalyne Ögate, and Yongwenwen Hesienare conducted monthly recordings of approximately one hour per target child with mothers, target children, and other children and occasionally a father. They used the built-in microphones on battery-powered Zoom H5 audio recorders on tripods and also filmed the recording sessions with small Canon digital cameras held by hand. These recordings were then immediately backed up to laptops and hard drives, and transcribed later.

One of the five researchers then transcribes each audio file—not necessarily the researcher who was present during the recording session. (Transcription lags between three and five months after initial recording.) Initially, a system was used in which the audio file was open in Audacity and Mid-CHAT-style transcription, including a precise timestamp for each utterance, was typed into the word processor Wordpad. After further training in the program CLAN in April 2017, the team switched to transcribing directly into CLAN. Yongwenwen Hesienare joined the group in May 2017, and continues to transcribe using the older method.

Transcription is all done in Nungon, with no translation or commentary in English or Tok Pisin, the lingua franca of Papua New Guinea. The orthography used is the standard Nungon practical orthography. This means that spirantized intervocalic /g/ is still written as , for instance.

The Nungon orthography is taught only during the first three years of the local elementary school (which the men did not attend) and has not regularly been used outside the school context—although this is now changing, with people writing text messages in Nungon on mobile phones since the area acquired mobile phone reception in mid-2015. Further the researchers’ degree of formal education ranges from fewer than three years of elementary school (the two men) to sixth grade (Lyn Ögate) and tenth grade (Nathalyne Ögate and Yongwenwen Hesienare). This means that there is some spelling variation among the transcribers. Spelling variation can be divided among acknowledged spelling errors and simple variation. Errors have been fixed in the first line of the data here. But transcribers vary and are inconsistent in whether, for instance, they leave a space between an enclitic and its preceding host. This variation has been largely left in the first line of the transcriptions here.

These researchers also inserted notes on extra-verbal sounds such as coughs, crying, laughing, and sucking of teeth in Nungon. They added comments explaining (in Nungon) the meaning of non-adult Nungon terms such as baby talk lexical items, Tok Pisin or English loans. They also commented on some incorrect pronunciations or non-standard speech by children or parents, giving the correct, standard versions in a comment tier. This extends to parents’ use of terms from or phonology of another Nungon village dialect.

The remaining tiers of the transcriptions were added by Hannah Sarvasy (Centre of Excellence for the Dynamics of Language, Australian National University) with major technical support from Sasha Wilmoth and Simon Hammond of Sydney-based Appen, who helped arrange for semi-automation of tier addition, morphologizing, and glossing. These three tiers are:

%xgls: this tier shows the morphologized, according to the Leipzig Glossing Rules, standard adult equivalent of the utterance transcribed in the first tier. Morphemes are separated by hyphens; clitics are separated from their hosts by equals signs.

%xcod: this tier uses the abbreviations listed below to gloss the morphemes in %xgls. A part of speech code followed by the symbol ^ precedes the root of the word. This placement is most relevant with inflected verbs; there, an object prefix can precede the verb root, so the part of speech code v^ intervenes between this prefix and the verb root.

%eng: this tier gives a free English translation.

Additional tiers that occur in places are %com, where the transcribers entered comments in Nungon, and %def, which is used here to translate the comments into English where they are left in the original Nungon. Part of speech codes
adj adjective
adv adverb
btn baby talk noun
btv baby talk verb
coll collective marker
conj conjunction
d demonstrative
expr expressive
ij interjection
n noun
neg negator
pro personal pronoun
prev preverb
tpadj Tok Pisin adjective
tpbtn Tok Pisin baby talk noun
tpn Tok Pisin noun
v verb
Other abbreviations
- morpheme boundary
= clitic boundary, also marking combination into a single phonological word
: scope over entire word
+ fusion
1, 2, 3 first person, second person, third person
ADEM adverbial demonstrative
ADJ adjectivizer
ADV adverbializer
ASSOC associative plural
AUTOREFL auto-reflexive
BEN benefactive
CAD call-at-distance
COMIT comitative
CNTR Counterfactual
DEICT deictic
DEL.IMP Delayed Imperative
DEP Dependent verb
DS different subject medial verb
DU dual
DUB dubitative
EMPH emphatic
FAR far distance
FOC focus
GEN genitive
IMM.IMP Immediate Imperative
IMNT Imminent aspect
INF Inferred Imperfective
IRR Irrealis
LDEM local nominal demonstrative
LINK linker
LOC locative
MDEM NP-modifying demonstrative
MID middle distance
MV Medial verb form
MVII Medial verb suffix II
NEAR near distance
NEG negative
NF Near Future
NMZ nominalizer
NP Near Past
NSG non-singular
O object argument of transitive verb
PE possessed constituent
PERF Perfect
PL plural: more than two
PRES Present
PRO personal pronoun
PROB Probable
POSS pertensive
QUES polar question marker
QUOT quotative
RED reduplicated
REL relativizer
REP repeated
RF Remote Future
RP Remote Past
RSTR restrictive, exclusive
SEMBL semblance
SG singular
SPEC specifier
SS same subject medial verb
TOP topicalizer
VOC vocative


This study would not have been possible without generous funding from the Australian Research Council to the Centre of Excellence for the Dynamics of Language, and within that, to CI Alan Rumsey. Many thanks to Alan Rumsey for enthusiastic support for the project from its inception. Gillian Wigglesworth, Katherine Demuth, Nina Hyams, and William O'Grady also provided helpful advice. Many thanks to the families of Abraham, Towet Oe, Niumen, Daren, and Arisen; to stalwart researchers Lyn Ögate, James Jio, Stanly Girip, Nathalyne Ögate, and Yongwenwen Hesienare; and to the greater Towet Nungon community for their participation and support. Judith Bishop of Appen initiated a partnership through which Sasha Wilmoth and Simon Hammond provided crucial technical support for coding and segmenting the transcripts. Many thanks to Brian MacWhinney for technical assistance in structuring the corpus.

Further Resources on Nungon

Reference grammar:

Sarvasy, Hannah. 2017. A Grammar of Nungon: A Papuan Language of Northeast New Guinea. Leiden: Brill.

Child language:

Sarvasy, Hannah and Eni Ögate. Forthcoming. Early writing in Nungon. In Joy Peyton and Ari Sherris (eds.), Early Writing in Indigenous Languages.

Sarvasy, Hannah. 2017. Syntactic complexity equals morphological simplification in Nungon CDS. Presentation at the 14th IASCL, Lyon, France.

Selected grammatical topics:

Sarvasy, Hannah. Under review. Multiple number systems in one language: split number in Nungon.

Sarvasy, Hannah. In press. Imperatives and commands in Nungon. In Alexandra Y. Aikhenvald and R. M. W. Dixon (eds.), Commands. Oxford: Oxford University Press. 224-249.

Sarvasy, Hannah. 2017. Quantification in Nungon. In Denis Paperno and Edward Keenan (eds.), Handbook of Quantification in Natural Language, Volume 2. New York: Springer. 611-665.

Sarvasy, Hannah. 2016. Sexless babies, sexed grandparents: Nungon gendered person terms. International Journal of Language and Culture 3:1, 115-136.

Sarvasy, Hannah. 2015. Breaking the clause chains: non-canonical medial clauses in Nungon. Studies in Language 39:3, 664-696.

Sarvasy, Hannah. 2015. The imperative split and the origin of switch-reference marking in Nungon. In Anna E. Jurgensen, Hannah Sande, Spencer Lamoureux, Kenny Baclawski, Alison Zerbe (eds.), Berkeley Linguistic Society 41 Proceedings. 473-492.

Sarvasy, Hannah. 2015. Split Number in Nungon. LSA Annual Meeting Extended Abstracts, [S.l.], v. 6, 25:1-5. Available at: . doi:

Sarvasy, Hannah. 2014. Four Finisterre-Huon languages: an introduction. In Hannah Sarvasy (ed.), Non-Spatial Setting in Finisterre-Huon Languages. Special issue of Language Typology and Universals: Sprachtypologie und Universalienforschung 67:3, 275-195.

Sarvasy, Hannah. 2014. Non-spatial setting in Nungon. In Hannah Sarvasy (ed.), Non-Spatial Setting in Finisterre-Huon Languages. Special issue of Language Typology and Universals: Sprachtypologie und Universalienforschung 67:3, 395-432.

Sarvasy, Hannah. 2013. Across the great divide: how birth-order terms scaled the Saruwaged Mountains in Papua New Guinea. Anthropological Linguistics 55:3, 234-255.

Other dialects:

Lauver, Doug and Urs Wegmann. 1994. Yau grammar essentials. Unpublished ms. Ukarumpa: Summer Institute of Linguistics.

Wegmann, Urs. 1994. Dialect survey report, Yau-Uruwa. Unpublished ms. Ukarumpa: Summer Institute of Linguistics.

Wegmann, Urs. 1993. Orthography paper – Yau language. Unpublished ms. Ukarumpa: Summer Institute of Linguistics.