From Vector Space Models to Vector Space Models of Semantics

Ganesh, H B Barathi; Kumar, M. Anand; Soman, K P

doi:10.1007/978-3-319-73606-8_4

Cited by 17 publications

(3 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We just count the frequency of each word in the piece of text and created a dictionary of them which is called tokenization process in NLP which is then passed to countvectorize object in scikit learn package to create a set of maximum features. We use fit transform method to model (Ganesh et al, 2016) the bag of words feature vector which are stored in an array.…”

Section: Importing Training and Cross-validation Frommentioning

confidence: 99%

CENNLP at SemEval-2018 Task 1: Constrained Vector Space Model in Affects in Tweets

Naveen¹,

Ganesh

et al. 2018

Proceedings of the 12th International Workshop on Semantic Evaluation

View full text Add to dashboard Cite

This paper discusses on task 1, "Affect in Tweets" sharedtask, conducted in SemEval-2018. This task comprises of various subtasks, which required participants to analyse over different emotions and sentiments based on the provided tweet data and also measure the intensity of these emotions for subsequent subtasks. Our approach is to come up with a model for all the subtasks on count based representation and use machine learning techniques for regression and classification related tasks. In this work, we use bag of words technique for supervised text classification and regression.. Further, fine tuning on various parameters for the bag of word, representation model we acquired better scores over various other baseline models (Vinayan et al.) participated in the sharedtask.

show abstract

Section: Importing Training and Cross-validation Frommentioning

confidence: 99%

CENNLP at SemEval-2018 Task 1: Constrained Vector Space Model in Affects in Tweets

Naveen¹,

Ganesh

et al. 2018

Proceedings of the 12th International Workshop on Semantic Evaluation

View full text Add to dashboard Cite

show abstract

“…We used Random Kitchen Sink (RKS) (Sathyan et al, 2018) algorithms with character word-bound based Term Frequency-Inverse Document Frequency (TF-IDF) (Barathi Ganesh et al, 2016) for text representation and classification was performed using Support Vector Machines (SVM) classifier (Soman et al, 2009), (Premjith et al, 2019). The rest of the paper is organised as follows: Section 2 describes about the related works, Section 3 describes about the Datasets, Section 4 describes about the preprocessing and different methods used, Section 5 describes about the result and analysis and Section 6 concludes the paper.…”

Section: Introductionmentioning

confidence: 99%

CEN-Tamil@DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm

Prasanth¹,

Raj²,

Adhithan³

et al. 2022

Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

View full text Add to dashboard Cite

This paper describes the approach of team CENTamil used for abusive comment detection in Tamil. This task aims to identify whether a given comment contains abusive comments. We used TF-IDF with char-wb analyzers with Random Kitchen Sink (RKS) algorithm to create feature vectors and the Support Vector Machine (SVM) classifier with polynomial kernel for classification. We used this method for both Tamil and Tamil-English datasets and secured first place with an f1-score of 0.32 and seventh place with an f1-score of 0.25, respectively. The code for our approach is shared in the GitHub repository. 1

show abstract

“…One of most naive way of representing word in vector form is one hot representation but it is very ineffective way for representing words in a large corpus since the length of one hot vector grows as the vocabulary increases, so we need a better and more effective way which captures some semantic similarities (Ganesh et al, 2016) between nearby words, thus creating the representation for words bring beneficial info about the word and its actual meaning, the methods which encodes these information about the words are called word embedding models, they are categorized into count based and predictive word embedding models. Both embedding models at least some way share syntactic meaning .…”

Section: Related Workmentioning

confidence: 99%

CENNLP at SemEval-2018 Task 2: Enhanced Distributed Representation of Text using Target Classes for Emoji Prediction Representation

Naveen¹,

Hariharan²,

Ganesh

et al. 2018

Proceedings of the 12th International Workshop on Semantic Evaluation

View full text Add to dashboard Cite

Emoji is one of the "fastest growing language " in pop-culture, especially in social media and it is very unlikely for its usage to decrease. These are generally used to bring an extra level of meaning to the texts, posted on social media platforms. Providing such an added info, gives more insights to the plain text, arising to hidden interpretation within the text. This paper explains our analysis on Task 2, " Multilingual Emoji Prediction" sharedtask conducted by Semeval-2018. In the task, a predicted emoji based on a piece of Twitter text are labelled under 20 different classes (most commonly used emojis) where these classes are learnt and further predicted are made for unseen Twitter text. In this work, we have experimented and analysed emojis predicted based on Twitter text, as a classification problem where the entailing emoji is considered as a label for every individual text data. We have implemented this using distributed representation of text through fastText. Also, we have made an effort to demonstrate how fastText framework can be useful in case of emoji prediction. This task is divided into two subtasks, they are based on dataset presented in two different languages English and Spanish.

show abstract

From Vector Space Models to Vector Space Models of Semantics

Cited by 17 publications

References 5 publications

CENNLP at SemEval-2018 Task 1: Constrained Vector Space Model in Affects in Tweets

CENNLP at SemEval-2018 Task 1: Constrained Vector Space Model in Affects in Tweets

CEN-Tamil@DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm

CENNLP at SemEval-2018 Task 2: Enhanced Distributed Representation of Text using Target Classes for Emoji Prediction Representation

Contact Info

Product

Resources

About