The MOR program provides a method for automatic tagging of corpora in the CHAT format. To make this work, it is necessary to construct a separate MOR grammar for each language. After analysis with MOR, users can then use the POST program to disambiguate the %mor line. We provide a POST disambiguation database for English, but for other languages, users will need to do the work of training a POST database for themselves. This whole system is described in a recent article on morphosyntactic analysis in CLAN.
We have working MOR grammars for these languages:
Five of these grammars (English, Chinese, Cantonese, Japanese, Spanish) also include POST databases created by Christophe Parisse's POSTTRAIN program. After running MOR, you run POST to automatically disambiguate the output of MOR. The Chinese version is functional, but needs a bit more training and clarification of part of speech categories to improve accuracy.
To help those interested in building their own MOR grammars, we provide these two examples of minMOR grammars. One is the basic example and the other indicates how to build a grammar that targets only a few word forms, such as the German article.