Dimitar Shterionov scite author profile

Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. This paper investigates how classical inference and learning tasks known from the graphical model community can be tackled for probabilistic logic programs. Several such tasks, such as computing the marginals, given evidence and learning from (partial) interpretations, have not really been addressed for probabilistic logic programs before. The first contribution of this paper is a suite of efficient algorithms for various inference tasks. It is based on the conversion of the program and the queries and evidence to a weighted Boolean formula. This allows us to reduce inference tasks to well-studied tasks, such as weighted model counting, which can be solved using state-of-the-art methods known from the graphical model and knowledge compilation literature. The second contribution is an algorithm for parameter estimation in the learning from interpretations setting. The algorithm employs expectation-maximization, and is built on top of the developed inference algorithms. The proposed approach is experimentally evaluated. The results show that the inference algorithms improve upon the state of the art in probabilistic logic programming, and that it is indeed possible to learn the parameters of a probabilistic logic program from interpretations.

show abstract

Human versus automatic quality evaluation of NMT and PBSMT

Shterionov

Superbo

Nagle

et al. 2018

Machine Translation

View full text Add to dashboard Cite

Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation

Vanmassenhove¹,

Shterionov²,

Gwilliam³

2021

View full text Add to dashboard Cite

Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification might affect language in a broader sense. We hypothesize that the 'algorithmic bias', i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: 'machine translationese'. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms -phrase-based statistical (PB-SMT) and neural MT (NMT). Our experiments show that there is a loss of lexical and morphological richness in the translations produced by all investigated MT paradigms for two language pairs (EN↔FR and EN↔ES).

show abstract

The Most Probable Explanation for Probabilistic Logic Programs with Annotated Disjunctions

Shterionov

Renkens²,

Vlasselaer³

et al. 2015

View full text Add to dashboard Cite

Probabilistic logic languages, such as ProbLog and CP-logic, are probabilistic generalizations of logic programming that allow one to model probability distributions over complex, structured domains. Their key probabilistic constructs are probabilistic facts and annotated disjunctions to represent binary and mutli-valued random variables, respectively. ProbLog allows the use of annotated disjunctions by translating them into probabilistic facts and rules. This encoding is tailored towards the task of computing the marginal probability of a query given evidence (MARG), but is not correct for the task of finding the most probable explanation (MPE) with important applications eg., diagnostics and scheduling. In this work, we propose a new encoding of annotated disjunctions which allows correct MARG and MPE. We explore from both theoretical and experimental perspective the trade-off between the encoding suitable only for MARG inference and the newly proposed (general) approach.

show abstract

A review of the state-of-the-art in automatic post-editing

Carmo

Shterionov

Moorkens

et al. 2020

Machine Translation

View full text Add to dashboard Cite

This article presents a review of the evolution of automatic post-editing, a term that describes methods to improve the output of machine translation systems, based on knowledge extracted from datasets that include post-edited content. The article describes the specificity of automatic post-editing in comparison with other tasks in machine translation, and it discusses how it may function as a complement to them. Particular detail is given in the article to the five-year period that covers the shared tasks presented in WMT conferences (2015–2019). In this period, discussion of automatic post-editing evolved from the definition of its main parameters to an announced demise, associated with the difficulties in improving output obtained by neural methods, which was then followed by renewed interest. The article debates the role and relevance of automatic post-editing, both as an academic endeavour and as a useful application in commercial workflows.

show abstract

Implementation and Performance of Probabilistic Inference Pipelines

Shterionov

Janssens

2015

View full text Add to dashboard Cite

NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender Neutral Alternatives

Vanmassenhove

Emmery

Shterionov

2021

View full text Add to dashboard Cite

Recent years have seen an increasing need for gender-neutral and inclusive language. Within the field of NLP, there are various mono-and bilingual use cases where gender inclusive language is appropriate, if not preferred due to ambiguity or uncertainty in terms of the gender of referents. In this work, we present a rulebased and a neural approach to gender-neutral rewriting for English along with manually curated synthetic data (WinoBias+) and natural data (OpenSubtitles and Reddit) benchmarks. A detailed manual and automatic evaluation highlights how our NeuTral Rewriter, trained on data generated by the rule-based approach, obtains word error rates (WER) below 0.18% on synthetic, in-domain and out-domain test sets.

show abstract

Combining SMT and NMT Back-Translated Data for Efficient NMT

Poncelas

Popović

Shterionov

et al. 2019

View full text Add to dashboard Cite

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016a), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model.Generally, NMT models are used for backtranslation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.