Ramón Fernandez Astudillo scite author profile

Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs. Existing approaches to generating text from AMR have focused on training sequenceto-sequence or graph-to-sequence models on AMR annotated data only. In this paper, we propose an alternative approach that combines a strong pre-trained language model with cycle consistency-based re-scoring. Despite the simplicity of the approach, our experimental results show these models outperform all previous techniques on the English LDC2017T10 dataset, including the recent use of transformer architectures. In addition to the standard evaluation metrics, we provide human evaluation experiments that further substantiate the strength of our approach.

show abstract

Findings of the WMT 2018 Shared Task on Quality Estimation

Specia¹,

Blain²,

Logacheva³

et al. 2018

View full text Add to dashboard Cite

We report the results of the WMT18 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document. This year we include four language pairs, three text domains, and translations produced by both statistical and neural machine translation systems. Participating teams from ten institutions submitted a variety of systems to different task variants and language pairs.

show abstract

Back-Translation-Style Data Augmentation for end-to-end ASR

Hayashi

Watanabe

Zhang

et al. 2018

View full text Add to dashboard Cite

In this paper we propose a novel data augmentation method for attention-based end-to-end automatic speech recognition (E2E-ASR), utilizing a large amount of text which is not paired with speech signals. Inspired by the back-translation technique proposed in the field of machine translation, we build a neural text-to-encoder model which predicts a sequence of hidden states extracted by a pre-trained E2E-ASR encoder from a sequence of characters. By using hidden states as a target instead of acoustic features, it is possible to achieve faster attention learning and reduce computational cost, thanks to sub-sampling in E2E-ASR encoder, also the use of the hidden states can avoid to model speaker dependencies unlike acoustic features. After training, the textto-encoder model generates the hidden states from a large amount of unpaired text, then E2E-ASR decoder is retrained using the generated hidden states as additional training data. Experimental evaluation using LibriSpeech dataset demonstrates that our proposed method achieves improvement of ASR performance and reduces the number of unknown words without the need for paired data.

show abstract

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Wang¹,

Luís²,

Marujo³

et al. 2015

Preprint

View full text Add to dashboard Cite

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Kolossa¹,

Astudillo²,

Hoffmann³

et al. 2010

EURASIP Journal on Audio, Speech, and Music Processing

View full text Add to dashboard Cite

Cycle-consistency Training for End-to-end Speech Recognition

Hori

Astudillo²,

Hayashi

et al. 2019

View full text Add to dashboard Cite

This paper presents a method to train end-to-end automatic speech recognition (ASR) models using unpaired data. Although the endto-end approach can eliminate the need for expert knowledge such as pronunciation dictionaries to build ASR systems, it still requires a large amount of paired data, i.e., speech utterances and their transcriptions. Cycle-consistency losses have been recently proposed as a way to mitigate the problem of limited paired data. These approaches compose a reverse operation with a given transformation, e.g., text-to-speech (TTS) with ASR, to build a loss that only requires unsupervised data, speech in this example. Applying cycle consistency to ASR models is not trivial since fundamental information, such as speaker traits, are lost in the intermediate text bottleneck. To solve this problem, this work presents a loss that is based on the speech encoder state sequence instead of the raw speech signal. This is achieved by training a Text-To-Encoder model and defining a loss based on the encoder reconstruction error. Experimental results on the LibriSpeech corpus show that the proposed cycle-consistency training reduced the word error rate by 14.7% from an initial model trained with 100-hour paired data, using an additional 360 hours of audio data without transcriptions. We also investigate the use of textonly data mainly for language modeling to further improve the performance in the unpaired data training scenario.

show abstract

Leveraging Abstract Meaning Representation for Knowledge Base Question Answering

Kapanipathi

Abdelaziz

Ravishankar

et al. 2021

View full text Add to dashboard Cite

Knowledge base question answering (KBQA) is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large end-to-end training datasets. In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system, that leverages (1) Abstract Meaning Representation (AMR) parses for task-independent question understanding; (2) a simple yet effective graph transformation approach to convert AMR parses into candidate logical queries that are aligned to the KB; (3) a pipeline-based approach which integrates multiple, reusable modules that are trained specifically for their individual tasks (semantic parser, entity and relationship linkers, and neuro-symbolic reasoner) and do not require end-to-end training data. NSQA achieves state-of-the-art performance on two prominent KBQA datasets based on DBpedia (QALD-9 and LC-QuAD 1.0). Furthermore, our analysis emphasizes that AMR is a powerful tool for KBQA systems.

show abstract

Uncertain LDA: Including Observation Uncertainties in Discriminative Transforms

Saeidi

Astudillo

Kolossa

2016

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Linear discriminant analysis (LDA) is a powerful technique in pattern recognition to reduce the dimensionality of data vectors. It maximizes discriminability by retaining only those directions that minimize the ratio of within-class and between-class variance. In this paper, using the same principles as for conventional LDA, we propose to employ uncertainties of the noisy or distorted input data in order to estimate maximally discriminant directions. We demonstrate the efficiency of the proposed uncertain LDA on two applications using state-of-the-art techniques. First, we experiment with an automatic speech recognition task, in which the uncertainty of observations is imposed by real-world additive noise. Next, we examine a full-scale speaker recognition system, considering the utterance duration as the source of uncertainty in authenticating a speaker. The experimental results show that when employing an appropriate uncertainty estimation algorithm, uncertain LDA outperforms its conventional LDA counterpart.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.