Abstract. This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA -currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster.
A Bag-of-Words model is widely used to extract the features from text, which is given as input to machine learning algorithm like MLP, neural network. The dataset considered is movie reviews with both positive and negative comments further converted to Bag-of-Words model. Then the Bag-of-Word model of the dataset is converted into vector representation which corresponds to a number of words in the vocabulary. Each word in the review documents is assigned with a score and the scores are later represented in vector representation which is later fed as input to neural model. In the Kera's deep learning library, the neural models will be simple feedforward network models with fully connected layers called ‘Dense'. Bigram language models are developed to classify encoded documents as either positive or negative. At first, reviews are converted to lines of token and then encoded to bag-of-words model. Finally, a neural model is developed to score bigram of words with word scoring modes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.