Availability of fixed-dose, child-friendly formulations of first-line tuberculosis drugs in Europe

Multiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one’s influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest that literal occurrences should be considered in both semantic and syntactic terms, which motivates their study in a treebank. We propose heuristics to automatically pre-identify candidate sentences that might contain literal occurrences of verbal VMWEs, and we apply them to existing treebanks in five typologically different languages: Basque, German, Greek, Polish and Portuguese. We also perform a linguistic study of the literal occurrences extracted by the different heuristics. The results suggest that literal occurrences constitute a rare phenomenon. We also identify some properties that may distinguish them from their idiomatic counterparts. This article is a largely extended version of Savary and Cordeiro (2018).

show abstract

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque

Iñurrieta¹,

Aduriz²,

Ilarraza³

et al. 2017

View full text Add to dashboard Cite

This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for SpanishBasque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being significantly better according to human evaluators.

show abstract

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

et al. 2020

View full text Add to dashboard Cite

Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work.

show abstract

Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system

Iñurrieta

Aduriz

Ilarraza

et al. 2018

View full text Add to dashboard Cite

Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics

Kunneman¹,

Iñurrieta²,

Camilleri³

et al. 2017

View full text Add to dashboard Cite

We are very grateful to our program committee members who gave constructive and detailed reviews for each of the student papers. We would also like to acknowledge researchers who agreed to mentor and provide expert feedback on the student papers. Many thanks to our faculty adviser Barbara Plank for her invaluable guidance, as well as the EACL 2017 organizing committee for their constant support and suggestions. Finally, we thank all students for their submissions and participation in the SRW. AbstractThis research proposal discusses pragmatic factors in image description, arguing that current automatic image description systems do not take these factors into account. I present a general model of the human image description process, and propose to study this process using corpus analysis, experiments, and computational modeling. This will lead to a better characterization of human image description behavior, providing a road map for future research in automatic image description, and the automatic description of perceptual stimuli in general. IntroductionAutomatic image description is a key challenge at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), because it requires a deep understanding of both images and natural language (Bernardi et al., 2016). There are two major datasets that are used to train and evaluate automatic image description models: Flickr30K (Young et al. (2014); 30K images) and MS COCO (Lin et al. (2014); 150K images). These descriptions were collected through a crowdsourcing task where Workers were asked to provide one-sentence descriptions for each image. One of the assumptions behind these datasets is that they provide objective image descriptions:"By asking people to describe the people, objects, scenes and activities that are shown in a picture without giving them any further information about the context in which the picture was taken, we were able to obtain conceptual descriptions that focus only on the information that can be obtained from the image alone." (Hodosh et al., 2013, p. 859) Human: Three policemen are standing around someone in a gray sweatshirt with stripes.Model: A group of people are walking down the street.Figure 1: Flickr30K image (4944749423) with a human-and a machine-generated description.The assumption of neutrality is a useful simplification: if it is more or less correct that similar images will have similar descriptions (that are not influenced by any external factors), then we can try to learn a mapping between images and descriptions. This is what Vinyals et al. (2015) do. They use a Long Short-Term Memory model to generate sequences of words, given the visual context.1 Their model is able to produce reasonably good image descriptions without using any higher-order reasoning. Figure 1 provides an example. 2 The machine-generated descriptions are typically shorter and more general than human descriptions. For example, the model talks about 'a group of people', rather than about a group of policemen and a civilian. Compare...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Uxoa Iñurrieta

Literal Occurrences of Multiword Expressions: Rare Birds That Cause a Stir

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system

Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics

Contact Info

Product

Resources

About