home | about | usage | demo | ||
The task | Riksdagsbiblioteket designed and conducted tests of software in order to determine whether manual indexing of Riksdagen's texts can be supplemented or even substituted by automatic indexing. The task was to select proper keywords for a text, from a thesaurus specially created for Riksdagen's documents. Keywords should not only identify the main subject but also have proper level of generality in the thesaurus hierarchy. Software from Connexor, Lingsoft, Kungliga Tekniska Högskola and Lexware Labs participated in the tests, and Lexware® proved to be best suited for the task. | ||||
The solution | Promising results were obtained by Lexware already in the tests, but the now fully developed application Djupindexering proves to have surprizingly high coverage and precision. For 80% of documents the keywords assigned by Lexware are the same (50%) or closely related to those assigned manually. The good results are mainly due to the integration of an external topic representatin - in this case Riksdagen's thesaurus - with Lexware's own rich language representation. Thesaurus terms are identified not only in direct text occurrances, but also indirectly, as occurrances of terms closely related in the thesaurus or in the lexicon. Djupindexering can be tested in the demo, examples of input and output can be viewed under "usage". |