||Derived Corpora and Counts
Researchers have constructed several derived corpora and frequency counts based on segments of the CHILDES database.
- BabySRL Corpus : Cynthia Fisher, Dan Roth, and Christos Christodoulopoulos have contributed
a version of the Brown corpus that has been parsed and labelled for semantic roles.
- Brent_Ratner Corpus: In order to train an automatic segmentation program,
Michael Brent at Washington University has created a corpus derived from the CDS of the CHILDES Bernstein-Ratner corpus.
The current version of this derived corpus was contributed by Sharon Goldwater.
- Johnson Sesotho Corpus: In order to train an automatic segmentation program,
Mark Johnson at Brown has created a corpus derived from the CDS (child-directed speech) of the CHILDES Sesotho corpus.
The available materials include the Python script that can be run on the Sesotho corpus, along with the
output in the form of sentences of child directed speech (CDS).
- Hungarian-Italian IDS: Judit Gervain's phonological transcription of
the Infant-Directed Speech in the Hungarian and Italian segments of CHILDES.
- Pearl_Sprouse Corpus: This corpus, contributed by Lisa Pearl and Jon Sprouse,
provides Penn TreeBank style parses for selected corpora from the American English segment of the CHILDES database.
- Traditional Mandarin: Copies of the Mandarin corpora in traditional orthography.
- Polish IDS: Luc Borota's phonological transcription of
the Infant-Directed Speech in the Polish segments of CHILDES.
- Determiners: Counts of the emergence of the determiner category across several
CHILDES corpora as analyzed in a forthcoming Psychological Science paper from Meylan, Frank, Roy, and Levy.
- UCI_Brent_Syl Corpus: In order to train an automatic segmentation program,
Lisa Pearl and Lawrence Phillips at UC Irvine have created a corpus derived from the CDS of the CHILDES Brent corpus.
The corpus comes with the scripts and dictionary used to produce it.
- Ping Li of Penn State has contributed frequency counts of child directed speech
for viewing directly or zipped for downloading,
along with the documentation.
- Portuguese Word Frequency: Ângela Maria Vieira Pinheiro's count of word frequency
in the writings of Brazilian school children.