Hostname: page-component-8448b6f56d-t5pn6 Total loading time: 0 Render date: 2024-04-23T06:27:13.059Z Has data issue: false hasContentIssue false

A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger

Published online by Cambridge University Press:  07 January 2005

GÉRARD HUET
Affiliation:
INRIA Rocquencourt, BP 105, F-78153 Le Chesnay Cedex (e-mail: Gerard.Huet@inria.fr)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We present the Zen toolkit for morphological and phonological processing of natural languages. This toolkit is presented in literate programming style, in the Pidgin ML subset of the Objective Caml functional programming language. This toolkit is based on a systematic representation of finite state automata and transducers as decorated lexical trees. All operations on the state space data structures use the zipper technology, and a uniform sharing functor permits systematic maximum sharing as dags. A particular case of lexical maps is specially convenient for building invertible morphological operations such as inflected forms dictionaries, using a notion of differential word. As a particular application, we describe a general method for tagging a natural language text given as a phoneme stream by analysing possible euphonic liaisons between words belonging to a lexicon of inflected forms. The method uses the toolkit methodology by constructing a non-deterministic transducer, implementing rational rewrite rules, by mechanical decoration of a trie representation of the lexicon index. The algorithm is linear in the size of the lexicon. A coroutine interpreter is given, and its correctness and completeness are formally proved. An application to the segmentation of Sanskrit by sandhi analysis is demonstrated.

Type
Article
Copyright
© 2005 Cambridge University Press
Submit a response

Discussions

No Discussions have been published for this article.