CHILDES Japanese Noji Corpus

Junya Noji (1920-2016)
Hiroshima University

Susanne Miyata
Department of Medical Sciences
Aichi Shukotoku University


Norio Naka
International Studies
Osaka Gakuin University


Participants: 1
Type of Study: naturalistic, longitudinal
Location: Japan
Media type: no longer available
DOI: doi:10.21415/T58C70

Browsable transcripts

Download transcripts

Citation information

Noji, Junya. (1973-77). Yooji no gengo seikatsu no jittai I-IV. Bunka Hyoron Shuppan.

Miyata, S. (2012b). CHILDES nihongoban: Nihongoyoo CHILDES manyuaru 2012. [Japanese CHILDES: The 2012 CHILDES manual for Japanese].

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Noji Corpus contains diary data collected by the Japanese linguist and dialectologist Junya Noji. Noji taught at Hiroshima University, changed after retirement to Naruto Pedagogical University, and worked as president of the university from 1992 - 1998. He published a great number of books and articles mainly on the history of Japanese language education.

He observed his first-born son Sumihare from birth (1948, March, 9th) until the age of 7, as he was growing up in Hiroshima. The data is based on handwritten records collected virtually daily(2243 days over 7 years), although the focus lies in the 3rd year. In the later years, less records were taken, resulting in a lower number of utterances available per month. Detailed description of the methodology can be found in the printed edition (Bunka Hyoron Shuppan). The data contains approximately 40,000 utterances by Sumihare, and about 22,000 utterances by other family members (his mother and father and his younger brother, Teruki) and other speakers such as the children from the neighborhood (Seejikun and Keekochan). A comment is provided for each utterance, establishing the context and interpreting the child's utterance.

NEW The electronic version of this data was entered, compared to the original, and adjusted to CHAT format by Norio Naka (Osaka Gakuin U.). The final brush up and the morpheme coding (JMOR07; 1;5 - 3;11) was done by Susanne Miyata (Aichi Shukutoku U.).

The print original uses katakana (phonetic syllable script) for the utterances, and regular hiragana (syllabic) and kanji (Chinese characters) for the comments, as well as a number of special symbols such as arrows to indicate the speaker and the addressee. The electronic version was done in Hebon (Hepburn transcription system). The format follows the current Japanese adaption of CHAT (Miyata, 2012b, Oshima-Takane & MacWhinney, eds., 1998). When the data entry began in 1992, only ASCII was available within the CHILDES system. But now, even though there is no longer any restriction concerning the fonts, the use of Hebon (at least in the main line) has the advantage of compatibility with programs such as MOR, and renders the data accessible to a greater number of researchers by removing the barrier of Japanese script. For better readability, Kana Kanji versions of the utterances have been added on the %ort tier.

Warnings: The comments (fed in by OCR and changed automatically to Latin script) are still to be checked.