Combining Language Resources Into A Grammar-Driven Swedish Parser
Författare och institution:
Malin Ahlberg (-); Ramona Enache (Institutionen för data- och informationsteknik (GU))
Publicerad i:
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12),
ISBN:
978-2-9517408-7-7
Publikationstyp:
Konferensbidrag, refereegranskat
Publiceringsår:
2012
Språk:
engelska
Fulltextlänk:
Sammanfattning (abstract):
This paper describes work on a rule-based, open-source parser for Swedish. The central component is a wide-coverage grammar implemented in the GF formalism (Grammatical Framework), a dependently typed grammar formalism based on Martin-Löf type theory. GF has strong support for multilinguality and has so far been used successfully for controlled languages and recent experiments have showed that it is also possible to use the framework for parsing unrestricted language. In addition to GF, we use two other main resources: the Swedish treebank Talbanken and the electronic lexicon SALDO. By combining the grammar with a lexicon extracted from SALDO we obtain a parser accepting all sentences described by the given rules. We develop and test this on examples from Talbanken. The resulting parser gives a full syntactic analysis of the input sentences. It will be highly reusable, freely available, and as GF provides libraries for compiling grammars to a number of programming languages, chosen parts of the the grammar may be used in various NLP applications.
Ämne (baseras på Högskoleverkets indelning av forskningsämnen):
NATURVETENSKAP ->
Data- och informationsvetenskap ->
Språkteknologi (språkvetenskaplig databehandling)
Data- och informationsvetenskap ->
Språkteknologi (språkvetenskaplig databehandling)
Nyckelord:
Parsing, Grammar and Syntax, Tools, systems, applications
Chalmers styrkeområden:
Informations- och kommunikationsteknik
Chalmers fundament:
Grundläggande vetenskaper
Postens nummer:
160369
Posten skapad:
2012-07-11 13:37
Posten ändrad:
2012-08-03 13:26