Transductive Ensemble Learning for Neural Machine Translation

Wang, Yiren; Wu, Lijun; Xia, Yingce; Qin, Tao; Zhai, ChengXiang; Liu, Tie-Yan

doi:10.1609/aaai.v34i04.6097

Cited by 14 publications

(7 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ensemble Learning Ensemble learning has been applied to various applications or scenarios for enhanced learning performance. In language translation tasks, transductive ensemble learning (TEL) is proposed to surpass marginal improvement on accuracy of traditional ensemble algorithms [21]. In zeroshot learning scenarios, multi-patch generative adversarial nets (MPGAN) with novel weighted voting strategies are also proposed for improvement of current ensemble learning algorithms for better performance [4].…”

Section: Related Workmentioning

confidence: 99%

EnHDC: Ensemble Learning for Brain-Inspired Hyperdimensional Computing

Wang¹,

Ma²,

Jiao³

2022

Preprint

View full text Add to dashboard Cite

Ensemble learning is a classical learning method utilizing a group of weak learners to form a strong learner, which aims to increase the accuracy of the model. Recently, braininspired hyperdimensional computing (HDC) becomes an emerging computational paradigm that has achieved success in various domains such as human activity recognition, voice recognition, and bio-medical signal classification. HDC mimics the brain cognition and leverages high-dimensional vectors (e.g., 10000 dimensions) with fully distributed holographic representation and (pseudo-)randomness. This paper presents the first effort in exploring ensemble learning in the context of HDC and proposes the first ensemble HDC model referred to as EnHDC. EnHDC uses a majority voting-based mechanism to synergistically integrate the prediction outcomes of multiple base HDC classifiers. To enhance the diversity of base classifiers, we vary the encoding mechanisms, dimensions, and data width settings among base classifiers. By applying EnHDC on a wide range of applications, results show that the EnHDC can achieve on average 3.2% accuracy improvement over a single HDC classifier. Further, we show that EnHDC with reduced dimensionality, e.g., 1000 dimensions, can achieve similar or even surpass the accuracy of baseline HDC with higher dimensionality, e.g., 10000 dimensions. This leads to a 20% reduction of storage requirement of HDC model, which is key to enabling HDC on low-power computing platforms.

show abstract

Section: Related Workmentioning

confidence: 99%

EnHDC: Ensemble Learning for Brain-Inspired Hyperdimensional Computing

Wang¹,

Ma²,

Jiao³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…There have been numerous works applying ensemble/knowledge distillation (Hinton et al, 2015) to machine translation (Kim and Rush, 2016;Freitag et al, 2017;Nguyen et al, 2020;Wang et al, 2020, dependency parsing (Kuncoro et al, 2016) and question answering (Mun et al, 2018;Ze et al, 2020;You et al, 2021;Chen et al, 2012). Regarding ensembling AMR graphs, Barzdins and Gosko (2016) propose choosing the AMR with highest average sentence Smatch to all other AMRs.…”

Section: Related Workmentioning

confidence: 99%

Maximum Bayes Smatch Ensemble Distillation for AMR Parsing

Lee¹,

Astudillo²,

Lam³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data augmentation seems to be fading. In this paper we propose to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation. In an extensive experimental setup, we push single model English parser performance to a new state-of-the-art, 85.9 (AMR2.0) and 84.3 (AMR3.0), and return to substantial gains from silver data augmentation. We also attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish. Finally we explore the impact of the proposed technique on domain adaptation, and show that it can produce gains rivaling those of human annotated data for QALD-9 and achieve a new state-of-the-art for BioAMR.

show abstract

“…In this paper, we use Transductive Ensemble Learning (TEL) [23] to aggregate multiple individual models for better performance. Note that, TEL is applied under the transductive setting, i.e., the model can observe the input sentences in the test set.…”

Section: Combining Improvementsmentioning

confidence: 99%

“…At last, we insert constituent attention (CA) module [22] to the Transformer encoder, which adds an extra constraint to attention heads to follow tree structures that can better capture the inherent dependency structure of input sentences. We also aggregate multiple models of these methods for inference following transductive ensemble learning (TEL) [23].…”

Section: Introductionmentioning

confidence: 99%

Combining Improvements for Exploiting Dependency Trees in Neural Semantic Parsing

Xie¹,

Ji²,

Xu³

et al. 2021

Preprint

View full text Add to dashboard Cite

The dependency tree of a natural language sentence can capture the interactions between semantics and words. However, it is unclear whether those methods which exploit such dependency information for semantic parsing can be combined to achieve further improvement and the relationship of those methods when they combine. In this paper, we examine three methods to incorporate such dependency information in a Transformer based semantic parser and empirically study their combinations. We first replace standard self-attention heads in the encoder with parent-scaled self-attention (PASCAL) heads, i.e., the ones that can attend to the dependency parent of each token. Then we concatenate syntax-aware word representations (SAWRs), i.e., the intermediate hidden representations of a neural dependency parser, with ordinary word embedding to enhance the encoder. Later, we insert the constituent attention (CA) module to the encoder, which adds an extra constraint to attention heads that can better capture the inherent dependency structure of input sentences. Transductive ensemble learning (TEL) is used for model aggregation, and an ablation study is conducted to show the contribution of each method. Our experiments show that CA is complementary to PASCAL or SAWRs, and PASCAL + CA provides stateof-the-art performance among neural approaches on ATIS, GEO, and JOBS.

show abstract

Transductive Ensemble Learning for Neural Machine Translation

Cited by 14 publications

References 13 publications

EnHDC: Ensemble Learning for Brain-Inspired Hyperdimensional Computing

EnHDC: Ensemble Learning for Brain-Inspired Hyperdimensional Computing

Maximum Bayes Smatch Ensemble Distillation for AMR Parsing

Combining Improvements for Exploiting Dependency Trees in Neural Semantic Parsing

Contact Info

Product

Resources

About