Assistance from a lemmatiser in lexical annotation - evaluation of LexLem. |
An evaluation of a lemmatiser used as a tagger shows that given a comprehensive dictionary and a simple frequency based ranking of word forms is enough to obtain as much as 98% coverage with uniquely assigned part of speech tags. It is interesting to note that such a tagger is able to discover mistakes in manual annotation - in this case in the Stockholm-Umeå Corpus of about 1 mln word tokens. |