There is huge amount of content produced online by amateur authors, covering a large variety of topics. Sentiment analysis (SA) extracts and aggregates users’ sentiments towards a target entity. Machine learning (ML) techniques are frequently used as the natural language data is in abundance and has definite patterns. ML techniques adapt to domain specific solution at high accuracy depending upon the feature set used. The lexicon-based techniques, using external dictionary, are independent of data to prevent overfitting but they miss context too in specialized domains. Corpus-based statistical techniques require large data to stabilize. Complex network based techniques are highly resourceful, preserving order, proximity, context and relationships. Recent applications developed incorporate the platform specific structural information i.e. meta-data. New sub-domains are introduced as influence analysis, bias analysis, and data leakage analysis. The nature of data is also evolving where transcribed customer-agent phone conversation are also used for sentiment analysis. This paper reviews sentiment analysis techniques and highlight the need to address natural language processing (NLP) specific open challenges. Without resolving the complex NLP challenges, ML techniques cannot make considerable advancements. The open issues and challenges in the area are discussed, stressing on the need of standard datasets and evaluation methodology. It also emphasized on the need of better language models that could capture context and proximity.
Sentiment analysis for health care deals with the diagnosis of health care related problems identified by the patients themselves. It takes the patients opinions into perspective to make policies and modifications that could directly address their problems. Sentiment analysis is used with commercial products to great effect and has outgrown to other application areas. Aspect based analysis of health care, not only recommend the services and treatments but also present their strong features for which they are preferred. Machine learning techniques are used to analyze millions of review documents and conclude them towards an efficient and accurate decision. The supervised techniques have high accuracy but are not extendable to unknown domains while unsupervised techniques have low accuracy. More work is targeted to improve the accuracy of the unsupervised techniques as they are more practical in this time of information flooding.
Lifelong machine learning (LML) models learn with experience maintaining a knowledge-base, without user intervention. Unlike traditional single-domain models they can easily scale up to explore big data. The existing LML models have high data dependency, consume more resources, and do not support streaming data. This paper proposes online LML model (OAMC) to support streaming data with reduced data dependency. With engineering the knowledge-base and introducing new knowledge features the learning pattern of the model is improved for data arriving in pieces. OAMC improves accuracy as topic coherence by 7% for streaming data while reducing the processing cost to half.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.