2015 International Conference on Asian Language Processing (IALP) 2015
DOI: 10.1109/ialp.2015.7451547
|View full text |Cite
|
Sign up to set email alerts
|

Tibetan text classification using distributed representations of words

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…Early analysis of shallow word embedding models, showed that word vectors providing stronger semantic representation have an higher norm [34]. Moreover, when comparing the norm of the vectors with their term frequency within the training corpus, it is possible to notice that highly frequent terms, as well as rare one have considerably smaller norm.…”
Section: Vector Significancementioning
confidence: 99%
“…Early analysis of shallow word embedding models, showed that word vectors providing stronger semantic representation have an higher norm [34]. Moreover, when comparing the norm of the vectors with their term frequency within the training corpus, it is possible to notice that highly frequent terms, as well as rare one have considerably smaller norm.…”
Section: Vector Significancementioning
confidence: 99%
“…e task of sensitive words detection has attracted a lot of attention, due to the prevalence of online users' generated content (UGC). e majority of detection algorithms are based on the concept of sensitive word tree (SMT), which represents one sensitive word by a node path from the root to a certain leaf node [5][6][7]. Note that common prefix characters from different sensitive words will usually occupy same nodes in the sensitive word tree.…”
Section: Sensitive Words Detectionmentioning
confidence: 99%
“…e sensitive word detection is a particular problem for content monitoring, which refers to the procedure of identifying target words from the given documents. e majority of existing detection algorithms are based on the concept of sensitive word tree (SMT) [5][6][7]. As a tree structure, the SMT is a variant of the hash tree.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, the classification of Tibetan texts has received more and more attention. Jiang tao [5]used the distributed representation of Tibetan words as a feature to significantly improve the performance of Tibetan text classification. Cao Hui [6] proposed an improved TF-IDF weighting algorithm.…”
Section: Related Workmentioning
confidence: 99%