Efficient production of chicken egg yolk antibodies against a conserved mammalian protein

We propose an approach for semantic parsing that uses a recurrent neural network to map a natural language question into a logical form representation of a KB query. Building on recent work by (Wang et al., 2015), the interpretable logical forms, which are structured objects obeying certain constraints, are enumerated by an underlying grammar and are paired with their canonical realizations. In order to use sequence prediction, we need to sequentialize these logical forms. We compare three sequentializations: a direct linearization of the logical form, a linearization of the associated canonical realization, and a sequence consisting of derivation steps relative to the underlying grammar. We also show how grammatical constraints on the derivation sequence can easily be integrated inside the RNNbased sequential predictor. Our experiments show important improvements over previous results for the same dataset, and also demonstrate the advantage of incorporating the grammatical constraints.

show abstract

Translating with non-contiguous phrases

Simard

Cancedda

Cavestro

et al. 2005

View full text Add to dashboard Cite

This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.

show abstract

Global Autoregressive Models for Data-Efficient Sequence Learning

Parshakova

Andreoli

Dymetman

2019

View full text Add to dashboard Cite

Standard autoregressive seq2seq models are easily trained by max-likelihood, but tend to show poor results under small-data conditions. We introduce a class of seq2seq models, GAMs (Global Autoregressive Models), which combine an autoregressive component with a log-linear component, allowing the use of global a priori features to compensate for lack of data. We train these models in two steps. In the first step, we obtain an unnormalized GAM that maximizes the likelihood of the data, but is improper for fast inference or evaluation. In the second step, we use this GAM to train (by distillation) a second autoregressive model that approximates the normalized distribution associated with the GAM, and can be used for fast inference and evaluation. Our experiments focus on language modelling under synthetic conditions and show a strong perplexity reduction of using the second autoregressive model over the standard one. * Work conducted during an internship at NAVER Labs Europe.

show abstract

A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset

Agarwal

Dymetman

2017

View full text Add to dashboard Cite

We train a char2char model on the E2E NLG Challenge data, by exploiting "out-of-the-box" the recently released tfseq2seq framework, using some of the standard options of this tool. With minimal effort, and in particular without delexicalization, tokenization or lowercasing, the obtained raw predictions, according to a small scale human evaluation, are excellent on the linguistic side and quite reasonable on the adequacy side, the primary downside being the possible omissions of semantic material. However, in a significant number of cases (more than 70%), a perfect solution can be found in the top-20 predictions, indicating promising directions for solving the remaining issues.

show abstract

Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

Bérard¹,

Calapodescu²,

Dymetman³

et al. 2019

View full text Add to dashboard Cite

We share a French-English parallel corpus of Foursquare restaurant reviews, and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such usergenerated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over existing online systems. Finally, we propose taskspecific metrics based on sentiment analysis or translation accuracy of domain-specific polysemous words.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marc Dymetman

Sequence-based Structured Prediction for Semantic Parsing

Translating with non-contiguous phrases

Global Autoregressive Models for Data-Efficient Sequence Learning

A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset

Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

Contact Info

Product

Resources

About