Character-Word LSTM Language Models

Verwimp, Lyan; Pelemans, Joris; hamme, Hugo Van; Wambacq, Patrick

doi:10.18653/v1/e17-1040

Cited by 42 publications

(36 citation statements)

References 22 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently there are several papers that propose single stage mechanisms [23,24,25,26,27,28,29,30]. Of the above, perhaps the closest to our work is Slim embedding [24], which is a special case of WEST.…”

Section: Introductionmentioning

confidence: 94%

West: Word Encoded Sequence Transducers

Variani

Suresh

Weintraub

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottle-neck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance. WEST bridges the gap between larger unit and sub-unit models and can be interpreted as a MaxEnt model over sub-unit features, which can be of independent interest. * equal contribution arXiv:1811.08417v1 [cs.LG]

show abstract

Section: Introductionmentioning

confidence: 94%

West: Word Encoded Sequence Transducers

Variani

Suresh

Weintraub

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…This probability is approximated by learning the conditional probability of each token given a fixed number of k-context tokens by using a neural network with parameters Θ. The tokens used for training can be of different granularities such as word [21], character [22], sub-word unit [23], or hybrid word-character [24]. The objective function of the LM is to maximize the sum of the logs of the conditional probabilities over a sequence of tokens:…”

Section: A Gpt-2 Modelmentioning

confidence: 99%

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

Adelani¹,

Mai

Fang

et al. 2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Advanced neural language models (NLMs) are widely used in sequence generation tasks because they are able to produce fluent and meaningful sentences. They can also be used to generate fake reviews, which can then be used to attack online review systems and influence the buying decisions of online shoppers. A problem in fake review generation is how to generate the desired sentiment/topic. Existing solutions first generate an initial review based on some keywords and then modify some of the words in the initial review so that the review has the desired sentiment/topic. We overcome this problem by using the GPT-2 NLM to generate a large number of high-quality reviews based on a review with the desired sentiment and then using a BERT based text classifier (with accuracy of 96%) to filter out reviews with undesired sentiments. Because none of the words in the review are modified, fluent samples like the training data can be generated from the learned distribution. A subjective evaluation with 80 participants demonstrated that this simple method can produce reviews that are as fluent as those written by people. It also showed that the participants tended to distinguish fake reviews randomly. Two countermeasures, GROVER and GLTR, were found to be able to accurately detect fake review.

show abstract

“…Language Models (LMs) have been dominant in literal representation tasks, and they can be divided into two categories which are statistical language models [19,9] and neural network language models [2,46,34].…”

Section: Literal Representation Techniquesmentioning

confidence: 99%

“…Neural network language models can be further divided into RNN-based LMs [31,30,41,46,34], cache-based LMs [38,13,18], and attention-based LMs [2,43,28]. Inspired by the first RNN-based LM [31,30], the work by Sundermeyer et al [41] leverages LSTM [16] to capture context dependences.…”

Section: Literal Representation Techniquesmentioning

confidence: 99%

Joint Embedding Learning of Educational Knowledge Graphs

Yao

Wang

Sun

et al. 2020

Advances in Analytics for Learning and Teaching

View full text Add to dashboard Cite

As an efficient model for knowledge organization, the knowledge graph has been widely adopted in several fields, e.g., biomedicine, sociology, and education. And there is a steady trend of learning embedding representations of knowledge graphs to facilitate knowledge graph construction and downstream tasks. In general, knowledge graph embedding techniques aim to learn vectorized representations which preserve the structural information of the graph. And conventional embedding learning models rely on structural relationships among entities and relations. However, in educational knowledge graphs, structural relationships are not the focus. Instead, rich literals of the graphs are more valuable. In this paper, we focus on this problem and propose a novel model for embedding learning of educational knowledge graphs. Our model considers both structural and literal information and jointly learns embedding representations. Three experimental graphs were constructed based on an educational knowledge graph which has been applied in real-world teaching. We conducted two experiments on the three graphs and other common benchmark graphs. The experimental results proved the effectiveness of our model and its superiority over other baselines when processing educational knowledge graphs.

show abstract

Character-Word LSTM Language Models

Cited by 42 publications

References 22 publications

West: Word Encoded Sequence Transducers

West: Word Encoded Sequence Transducers

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection

Joint Embedding Learning of Educational Knowledge Graphs

Contact Info

Product

Resources

About