Jesús Giménez scite author profile

Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.

show abstract

Fast and accurate part-of-speech tagging

Giménez¹,

Màrquez²

2004

View full text Add to dashboard Cite

Physical Performance Differences Between Starter and Non‐Starter Players During Professional Soccer Friendly Matches

Giménez

Leicht

Gómez

2019

View full text Add to dashboard Cite

The aim of this study was to investigate the physical performance differences between players that started (i.e. starters, ≥65 minutes played) and those that were substituted into (i.e. non‐starter) soccer friendly matches. Fourteen professional players (age: 23.2 ± 2.7 years, body height: 178 ± 6 cm, body mass: 73.2 ± 6.9 kg) took part in this study. Twenty, physical performance‐related match variables (e.g. distance covered at different intensities, accelerations and decelerations, player load, maximal running speed, exertion index, work‐to‐rest ratio and rating of perceived exertion) were collected during two matches. Results were analysed using effect sizes (ES) and magnitude based inferences. Compared to starters, non‐starters covered greater match distance within the following intensity categories: >3.3≤4.2m/s (very likely), >4.2≤5 m/s (likely) and >5≤6.9 m/s (likely). In contrast, similar match average acceleration and deceleration values were identified for starters and non‐starters (trivial). Indicators of workloads including player loads (very likely), the exertion index (very likely), and the work–to‐rest ratio (very likely) were greater, while self‐ reported ratings of perceived exertion were lower (likely) for non‐starters compared to starters. The current study demonstrates that substantial physical performance differences during friendly soccer matches exist between starters and non‐starters. Identification of these differences enables coaches and analysts to potentially prescribe optimal training loads and microcycles based upon player’s match starting status.

show abstract

Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation

Giménez

Màrquez²

2010

View full text Add to dashboard Cite

Linguistic measures for automatic machine translation evaluation

Giménez

Màrquez

2010

Machine Translation

View full text Add to dashboard Cite

Assessing the quality of candidate translations involves diverse linguistic\ud facets. However, most automatic evaluation methods in use today rely on limited\ud quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences.\ud In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems.\ud Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.Peer ReviewedPostprint (published version

show abstract

Semantic role labeling as sequential tagging

Màrquez¹,

Comas²,

Giménez³

et al. 2005

View full text Add to dashboard Cite

In this paper we present a semantic role labeling system submitted to the CoNLL-2005 shared task. The system makes use of partial and full syntactic information and converts the task into a sequential BIO-tagging. As a result, the labeling architecture is very simple . Building on a state-of-the-art set of features, a binary classifier for each label is trained using AdaBoost with fixed depth decision trees. The final system, which combines the outputs of two base systems performed F 1 =76.59 on the official test set. Additionally, we provide results comparing the system when using partial vs. full parsing input information. Goals and System ArchitectureThe goal of our work is twofold. On the one hand, we want to test whether it is possible to implement a competitive SRL system by reducing the task to a sequential tagging. On the other hand, we want to investigate the effect of replacing partial parsing information by full parsing. For that, we built two different individual systems with a shared sequential strategy but using UPC chunks-clauses, and Charniak's parses, respectively. We will refer to those systems as PP UPC and FP CHA , hereinafter.Both partial and full parsing annotations provided as input information are of hierarchical nature. Our system navigates through these syntactic structures in order to select a subset of constituents organized sequentially (i.e., non embedding). Propositions are treated independently, that is, each target verb generates a sequence of tokens to be annotated. We call this pre-processing step sequentialization.The sequential tokens are selected by exploring the sentence spans or regions defined by the clause boundaries 1 . The top-most syntactic constituents falling inside these regions are selected as tokens. Note that this strategy is independent of the input syntactic annotation explored, provided it contains clause boundaries. It happens that, in the case of full parses, this node selection strategy is equivalent to the pruning process defined by Xue and Palmer (2004), which selects sibling nodes along the path of ancestors from the verb predicate to the root of the tree 2 . Due to this pruning stage, the upper-bound recall figures are 95.67% for PP UPC and 90.32% for FP CHA . These values give F 1 performance upper bounds of 97.79 and 94.91, respectively, assuming perfect predictors (100% precision).The nodes selected are labeled with B-I-O tags depending if they are at the beginning, inside, or outside of a verb argument. There is a total of 37 argument types, which amount to 37*2+1=75 labels.Regarding the learning algorithm, we used generalized AdaBoost with real-valued weak classifiers, which constructs an ensemble of decision trees of fixed depth (Schapire and Singer, 1999). We considered a one-vs-all decomposition into binary prob-1 Regions to the right of the target verb corresponding to ancestor clauses are omitted in the case of partial parsing.2 With the unique exception of the exploration inside sibling PP constituents proposed by (Xue and Palmer, 2004). 193

show abstract

A smorgasbord of features for automatic MT evaluation

Giménez

Màrquez

2008

View full text Add to dashboard Cite

show abstract

Context-aware discriminative phrase selection for statistical machine translation

Giménez

Màrquez

2007

View full text Add to dashboard Cite

In this work we revise the application of discriminative learning to the problem of phrase selection in Statistical Machine Translation. Inspired by common techniques used in Word Sense Disambiguation, we train classifiers based on local context to predict possible phrase translations. Our work extends that of Vickrey et al. (2005) in two main aspects. First, we move from word translation to phrase translation. Second, we move from the 'blank-filling' task to the 'full translation' task. We report results on a set of highly frequent source phrases, obtaining a significant improvement, specially with respect to adequacy, according to a rigorous process of manual evaluation.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jesús Giménez

Linguistic features for automatic evaluation of heterogenous MT systems

Fast and accurate part-of-speech tagging

Physical Performance Differences Between Starter and Non‐Starter Players During Professional Soccer Friendly Matches

Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation

Linguistic measures for automatic machine translation evaluation

Semantic role labeling as sequential tagging

A smorgasbord of features for automatic MT evaluation

Context-aware discriminative phrase selection for statistical machine translation

Contact Info

Product

Resources

About