Design, Implementation and Evaluation of an Inflectional Morphology Finite State Transducer for Irish

Dhonnchadha, Elaine Uí; Pháidín, Caoilfhionn Nic; Genabith, Josef van

doi:10.1007/s10590-004-2480-9

Cited by 4 publications

(6 citation statements)

References 12 publications

(1 reference statement)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared to other EU-official languages, Irish language technology is under-resourced, as highlighted by a recent study (Judge et al, 2012). In the area of morpho-syntactic processing, recent years have seen the development of a part-of-speech tagger (Uí Dhonnchadha and van Genabith, 2006), a morphological analyser (Uí Dhonnchadha et al, 2003), a shallow chunker (Uí Dhonnchadha, 2009), a dependency treebank (Lynn et al, 2012a;Lynn et al, 2012b) and statistical dependency parsing models for MaltParser (Nivre et al, 2006) and Mate parser (Bohnet, 2010) trained on this treebank (Lynn et al, 2013).…”

Section: Irish Language and Treebankmentioning

confidence: 99%

“…Considerable efforts have been made over the past decade to develop natural language processing resources for the Irish language (Uí Dhonnchadha et al, 2003;Uí Dhonnchadha and van Genabith, 2006;Uí Dhonnchadha, 2009;Lynn et al, 2012a;Lynn et al, 2012b;Lynn et al, 2013). One such resource is the Irish Dependency Treebank (Lynn et al, 2012a) which contains just over 1000 gold standard dependency parse trees.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study

Lynn¹,

Foster²,

Dras³

et al. 2014

Proceedings of the First Celtic Language Technology Workshop

View full text Add to dashboard Cite

We present a study of cross-lingual direct transfer parsing for the Irish language. Firstly we discuss mapping of the annotation scheme of the Irish Dependency Treebank to a universal dependency scheme. We explain our dependency label mapping choices and the structural changes required in the Irish Dependency Treebank. We then experiment with the universally annotated treebanks of ten languages from four language family groups to assess which languages are the most useful for cross-lingual parsing of Irish by using these treebanks to train delexicalised parsing models which are then applied to sentences from the Irish Dependency Treebank. The best results are achieved when using Indonesian, a language from the Austronesian language family.

show abstract

Section: Irish Language and Treebankmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study

Lynn¹,

Foster²,

Dras³

et al. 2014

Proceedings of the First Celtic Language Technology Workshop

View full text Add to dashboard Cite

show abstract

“…(Mel'čuk, 1973). In the instantiated version of the pipeline presented in this paper, the input structured data is the WebNLG data (Aquilina et al, 2023), made of DBpedia triple sets, and we use the FORGe grammar-based generator to produce the intermediate representations (Mille et al, 2019) and the Irish NLP toolkit (Dhonnchadha et al, 2003) to produce the final representation: details about the dataset and tools are provided in Section 3.…”

Section: Modular Structurementioning

confidence: 99%

Generating Irish Text with a Flexible Plug-and-Play Architecture

Mille,

Uí Dhonnchadha,

Cassidy

et al. 2023

Proceedings of the 2nd Workshop on Pattern-Based Approaches to NLP in the Age of Deep Learning

View full text Add to dashboard Cite

In this paper, we describe M-FleNS, a multilingual flexible plug-and-play architecture designed to accommodate neural and symbolic modules, and initially instantiated with rulebased modules. We focus on using M-FleNS for the specific purpose of building new resources for Irish, a language currently underrepresented in the NLP landscape. We present the general M-FleNS framework and how we use it to build an Irish Natural Language Generation system for verbalising part of the DBpedia ontology and building a multilayered dataset with rich linguistic annotations. Via automatic and human assessments of the output texts we show that with very limited resources we can create a system that reaches high levels of fluency and semantic accuracy, while having very low energy and memory requirements.

show abstract

“…• A tagset for Irish had been developed within the PAROLE project, by members of the NCI team (http://www.ite.ie/corpus/pos.htm) • A pilot finite-state tokenizer and morphological transducer for Irish inflectional morphology had been developed (Uí Dhonnchadha, 2002;Uí Dhonnchadha, Nic Phá idín, & Van Genabith, 2003). • We established that a constraint based tagger 9 was available to us…”

Section: Irish Linguistic Toolsmentioning

confidence: 99%

“…As newspaper and web texts in particular contain a high proportion of proper nouns, lists of names and places were also scanned and incorporated into the lexicon (Uí Dhonnchadha et al, 2003). Average recognition rates increased to 95% on unrestricted text.…”

Section: Tokenization and Morphological Analysismentioning

confidence: 99%

Efficient corpus development for lexicography: building the New Corpus for Ireland

Kilgarriff

Rundell

Dhonnchadha

2006

Lang Resources & Evaluation

View full text Add to dashboard Cite

In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus-the New Corpus for Ireland (NCI)-to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We describe its design, collection and encoding.The NCI was developed as part of the set-up phase of a project for a new English-to-Irish Dictionary (NEID). 1 The NEID is intended to be used by scholars, school and university students, translators, people working in the media, and the general public. It will replace the current main reference work, Tomas de Bhaldraithe's English-Irish Dictionary (1959), a highly-regarded dictionary but now almost 50-years-old.The island of Ireland includes both the Republic of Ireland and, in the North, six counties of the province of Ulster, which form part of the United Kingdom. The border was not critical to the project; collaborators and texts alike were sought both North and South of the border, and the language and dialects of Ulster were treated on a par with those of other regions. In this paper, ''Ireland'' means the whole island.About 62,000 speakers use Irish as their main everyday language, and almost 340,000 speakers use Irish on a daily basis. 2 It was the main language of Ireland until English displaced it (substantially as a result of language policies under the British Empire). It remains the chief language in a few parts of the island, collectively known as the Gaeltacht, which are mainly located along the western seaboard. There are three main dialects of Irish-Connacht, Munster, and Ulster-corresponding respectively to the most westerly, southerly, and northerly areas. The language has an important place in Irish culture and identity and is very widely taught in schools. 3 Irish is one of the two official languages of Ireland, the other being English. The Irish language belongs to the Celtic branch of the Indo-European family of languages, and within this branch, it forms part of the Goidelic branch along with Manx and Scots Gaelic, the other tradition being Brythonic, which comprises Welsh, Cornish, and Breton.The remainder of the paper describes the design, collection, and encoding of the NCI in Sects. 2, 3, and 4. A particular area of innovation was the use of the web as a source of some of the constituent texts, and the issues arising there are covered in some detail, as are the practical issues of data 'cleaning'. The morphological analyzer and part-of-speech tagger for Irish are described in Sect. 5. Section 6 describes the project team and resources, with a view to assisting others with comparable projects in mind to assess the resources they require. Section 7 outlines possible further developments, and Sect. 8 concludes. DesignIn the first instance, a detailed cor...

show abstract

Design, Implementation and Evaluation of an Inflectional Morphology Finite State Transducer for Irish

Cited by 4 publications

References 12 publications

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study

Generating Irish Text with a Flexible Plug-and-Play Architecture

Efficient corpus development for lexicography: building the New Corpus for Ireland

Contact Info

Product

Resources

About