Eva Martínez Garcia scite author profile

Eva Martínez Garcia

5Publications

40Citation Statements Received

40Citation Statements Given

How they've been cited

How they cite others

Affiliations

National Composites Centre, Universitat Politècnica de Catalunya

Publications

Order By: Most citations

Weighted Set-Theoretic Alignment of Comparable Sentences

Azpeitia¹,

Etchegoyhen²,

Garcia³

2017

View full text Add to dashboard Cite

This article presents the STACC w system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.

show abstract

Supervised and Unsupervised Minimalist Quality Estimators: Vicomtech’s Participation in the WMT 2018 Quality Estimation Task

Etchegoyhen¹,

Garcia²,

Azpeitia³

2018

View full text Add to dashboard Cite

We describe Vicomtech's participation in the WMT 2018 shared task on quality estimation, for which we submitted minimalist quality estimators. The core of our approach is based on two simple features: lexical translation overlaps and language model cross-entropy scores. These features are exploited in two system variants: uMQE is an unsupervised system, where the final quality score is obtained by averaging individual feature scores; sMQE is a supervised variant, where the final score is estimated by a Support Vector Regressor trained on the available annotated datasets. The main goal of our minimalist approach to quality estimation is to provide reliable estimators that require minimal deployment effort, few resources, and, in the case of uMQE, do not depend on costly data annotation or post-editing. Our approach was applied to all language pairs in sentence quality estimation, obtaining competitive results across the board.

show abstract

Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation

Garcia¹,

Creus²,

España-Bonet³

et al. 2017

View full text Add to dashboard Cite

We integrate new mechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies to words that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each word translation given its context. Second, we extend the decoder with a new stochastic mechanism that, at translation time, allows to introduce changes in the translation oriented to improve its lexical consistency. We evaluate our system on EnglishSpanish document translation, and we conduct automatic and manual assessments of its quality. The automatic evaluation metrics, applied mainly at sentence level, do not reflect significant variations. On the contrary, the manual evaluation shows that the system dealing with lexical consistency is preferred over both a standard sentence-level and a standard document-level phrase-based MT systems.

show abstract

Co-Word Graphs for Design and Manufacture Knowledge Mapping

Gopsill¹,

Humphrey

Thompson³

et al. 2020

Proc. Des. Soc.: Des. Conf.

View full text Add to dashboard Cite

Design & Manufacture Knowledge Mapping is a critical activity in medium-to-large organisations supporting many organisational activities. However, techniques for effective mapping of knowledge often employ interviews, consultations and appraisals. Although invaluable in providing expert insight, the application of such methods is inherently intrusive and resource intensive. This paper presents word co-occurrence graphs as a means to automatically generate knowledge maps from technical documents and validates against expert generated knowledge maps.

show abstract

STACC, OOV Density and N-gram Saturation: Vicomtech’s Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering

Azpeitia¹,

Etchegoyhen²,

Garcia³

2018

View full text Add to dashboard Cite

We describe Vicomtech's participation in the WMT 2018 Shared Task on parallel corpus filtering. We aimed to evaluate a simple approach to the task, which can efficiently process large volumes of data and can be easily deployed for new datasets in different language pairs and domains. We based our approach on STACC, an efficient and portable method for parallel sentence identification in comparable corpora. To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs. Additionally, we experimented with a complementary data saturation method based on source sentence n-grams, with the goal of demoting parallel sentence pairs that do not contribute significant amounts of yet unobserved n-grams. Our approach requires no prior training and is highly efficient on the type of large datasets featured in the corpus filtering task. We achieved competitive results with this simple and portable method, ranking in the top half among competing systems overall.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.