Statistical Machine Translation Features with Multitask Tensor Networks

Setiawan, Hendra; Huang, Zhongqiang; Devlin, Jacob; Lamar, Thomas; Zbib, Rabih; Schwartz, Richard; Makhoul, John

doi:10.3115/v1/p15-1004

Cited by 10 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tensors are powerful because they can capture important higher order interactions across time, feature dimensions, and multiple modalities (Kossaifi et al, 2017). For unimodal tasks, tensors have been used for part-of-speech tagging (Srikumar and Manning, 2014), dependency parsing (Lei et al, 2014), word segmentation (Pei et al, 2014), question answering (Qiu and Huang, 2015), and machine translation (Setiawan et al, 2015). For multimodal tasks, Huang et al (2017) used tensor products between images and text features for image captioning.…”

Section: Related Workmentioning

confidence: 99%

Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

Liang¹,

Liu²,

Tsai³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

There has been an increased interest in multimodal language processing including multimodal dialog, question answering, sentiment analysis, and speech recognition. However, naturally occurring multimodal data is often imperfect as a result of imperfect modalities, missing entries or noise corruption. To address these concerns, we present a regularization method based on tensor rank minimization. Our method is based on the observation that high-dimensional multimodal time series data often exhibit correlations across time and modalities which leads to low-rank tensor representations. However, the presence of noise or incomplete values breaks these correlations and results in tensor representations of higher rank. We design a model to learn such tensor representations and effectively regularize their rank. Experiments on multimodal language data show that our model achieves good results across various levels of imperfection.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

Liang¹,

Liu²,

Tsai³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…C , Tucker and TT decompositions have been leveraged in the context of neural networks [56,59,79,96,97,120,131,156,159], with the weight matrix of a fully-connected layer or a convolutional layer stored compressedly in a low-rank tensor, thus reducing redundancies in the network parameterization. As concerns improving theoretical aspects and understanding of deep neural networks through tensors, Cohen et al [29] analyzed the expressive power of deep architectures by drawing analogies between shallow networks and the rank-1 C , as well as between deep networks and the hTucker decomposition.…”

Section: Machinementioning

confidence: 99%

PASTA: a parallel sparse tensor algorithm benchmark suite

et al. 2019

CCF Trans. HPC

View full text Add to dashboard Cite

Tensor methods have gained increasingly a ention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and their algorithms become critical to further improve the performance of these methods and enhance the interpretability of their output. is work presents a sparse tensor algorithm benchmark suite (PASTA) for single-and multi-core CPUs. To the best of our knowledge, this is the rst benchmark suite for sparse tensor world. PASTA targets on: 1) helping application users to evaluate di erent computer systems using its representative computational workloads; 2) providing insights to be er utilize existed computer architecture and systems and inspiration for the future design. is benchmark suite will be publicly released. ACM Reference format:

show abstract

“…These models can be trained on parallel corpora and do not need word alignments to be learned in advance. There are also neural translation models that are trained on word-aligned parallel corpus (Devlin et al, 2014;Meng et al, 2015;Zhang et al, 2015;Setiawan et al, 2015), which use the alignment information to decide which parts of the source sentence are more important for predicting one particular target word. All these models are trained on plain source and target sentences without considering any syntactic information while our neural model learns rule selection for tree-based translation rules and makes use of the tree structure of natural language for better translation.…”

Section: Related Workmentioning

confidence: 99%

A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation

Zhang

Utiyama

Sumita

et al. 2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

One of the major challenges for statistical machine translation (SMT) is to choose the appropriate translation rules based on the sentence context. This paper proposes a continuous space rule selection (CSRS) model for syntax-based SMT to perform this context-dependent rule selection. In contrast to existing maximum entropy based rule selection (MERS) models, which use discrete representations of words as features, the CSRS model is learned by a feed-forward neural network and uses real-valued vector representations of words, allowing for better generalization. In addition, we propose a method to train the rule selection models only on minimal rules, which are more frequent and have richer training data compared to non-minimal rules. We tested our model on different translation tasks and the CSRS model outperformed a baseline without rule selection and the previous MERS model by up to 2.2 and 1.1 points of BLEU score respectively.

show abstract

Statistical Machine Translation Features with Multitask Tensor Networks

Cited by 10 publications

References 22 publications

Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

PASTA: a parallel sparse tensor algorithm benchmark suite

A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation

Contact Info

Product

Resources

About