2015
DOI: 10.48550/arxiv.1508.02096
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(53 citation statements)
references
References 12 publications
0
53
0
Order By: Relevance
“…The single BTS model improves on average over the CRF models trained using the same data, though clearly there is some benefit in using external resources. Note that BTS is particularly strong in Finnish, surpassing even CRF+ by nearly 1.5% (absolute), probably because the byte representation generalizes better to agglutinative languages than word-based models, a finding validated by Ling et al (2015). In addition, the baseline CRF models, including the (compressed) cluster tables, require about 50 MB per language, while BTS is under 10 MB.…”
Section: Part-of-speech Taggingmentioning
confidence: 93%
See 2 more Smart Citations
“…The single BTS model improves on average over the CRF models trained using the same data, though clearly there is some benefit in using external resources. Note that BTS is particularly strong in Finnish, surpassing even CRF+ by nearly 1.5% (absolute), probably because the byte representation generalizes better to agglutinative languages than word-based models, a finding validated by Ling et al (2015). In addition, the baseline CRF models, including the (compressed) cluster tables, require about 50 MB per language, while BTS is under 10 MB.…”
Section: Part-of-speech Taggingmentioning
confidence: 93%
“…Recent work has shown that modeling the sequence of characters in each token with an LSTM can more effectively handle rare and unknown words than independent word embeddings (Ling et al, 2015;Ballesteros et al, 2015). Similarly, language modeling, especially for morphologically complex languages, benefits from a Convolutional Neural Network (CNN) over characters to generate word embeddings (Kim et al, 2015).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Jozefowicz et al (2015) address this limitation in their "Big LM" architecture by representing words as points in a lower-dimensional continuous space using a convolutional neural network (CNN). These models can represent words as embeddings (real-valued vectors) that capture perceptual similarity between words on the basis of shared structure, such as word lemmas or part of speech markings like -ing (Ling et al, 2015). The LSTM is then used to make predictions in this lower dimensional space, reducing the number of computations in the softmax function from the size of the vocabulary to the dimensionality of the CNN-derived embeddings.…”
Section: Recurrent Neural Network Language Model (Rnn Lm)mentioning
confidence: 99%
“…Launching a novel drug into the market would take more than ten years on average, with a substantial investment of billions of USD [ 2 ]. Meanwhile, deep learning shows great success in other fields, such as natural-language processing [ 3 , 4 , 5 , 6 , 7 , 8 , 9 ] and pattern recognition [ 10 , 11 , 12 , 13 ], as well as the improvement of computing power and dataset availability. Its potential in promoting efficiency and success rate of drug development, in particular the prediction of molecular properties, has been widely investigated for years [ 14 , 15 , 16 , 17 , 18 ].…”
Section: Introductionmentioning
confidence: 99%