The task

It is possible to ascribe part of speech and grammatical form to each affix. This kind of specification can be added to each row in the affix dictionary. The difficult part is to determine which word form belongs to which lemma, i.e. to chunk all word forms of a lexical word. In some cases the task can hardly be automated, e.g. irregular word forms are simply listed in a word dictionary. It is possible though to automatically determine lemmas in the case when rules are applied to generate full forms.


The result list

A lemma chunk is supposed to have the following format:

basic_form pos
infl.nr1:wordform1
...
infl.nr4:wordform4

e.g.
take v
1:take
2:takes
3:taken
4:took