Tools for corpus processing are not normally based on explicit language representation, language independent stochastic techniques are used instead. The approach has several advantages, among others portability to new languages, and rapid development, neither of which can be said about the approach of the Lexware system, the kernel of which is based on an explicit comprehensive representation of a specific language. Yet, in some respects it is preferable. Such a system can improve in a cumulative way together with its language representation, while it is not obvious whether similar cumulative development can be ascertained for stochastic systems. And most important of all - there is no end to the variety of possible extractions if parameters of language representation are made available in queries.
|