Parsing Words |
An analysis system with a detailed word analysis module is describe for Swedish.
Idiom-driven analysis is the central idea of the thesis; where "idiom'' is meant generally, as the idiom of a language and not as a specific idiomatic expression.
The chief trait of the analysis algorithm is that the preference for the largest matches in a lexicon is applied recursively in a binary decomposition of the input string; input is scanned detractively: from the whole string to its parts. The binary analysis trees are hypothesised in a breadth-first ordering, which guarantees that all decompositions into the lowest number of components are always obtained first, and that readings with more components need not be pursued at all. In order to further enhance precision, the measurement of complexity of a reading is generalised and translated onto a scale of weights representing the degree of lexicalization of a component. It is shown that improvements are possible within the grammar-based framework thanks to a comprehensive vocabulary representation, proper text segmentation and language analysis strategy, which does not engage in false ambiguities. These principles are implemented in the word analysis system LexLem, which is a part of the LexWare system for text analysis. The book contributes to the concept of an ideal vocabulary, and to the the need for separation of normal language use from its meta-uses in analysis strategies. The ideas of idiom-driven analysis can be extended from word to sentence analysis. |