This paper studies sentiment analysis of conditional sentences. The aim is to determine whether opinions expressed on different topics in a conditional sentence are positive, negative or neutral. Conditional sentences are one of the commonly used language constructs in text. In a typical document, there are around 8% of such sentences. Due to the condition clause, sentiments expressed in a conditional sentence can be hard to determine. For example, in the sentence, if your Nokia phone is not good, buy this great Samsung phone, the author is positive about "Samsung phone" but does not express an opinion on "Nokia phone" (although the owner of the "Nokia phone" may be negative about it). However, if the sentence does not have "if", the first clause is clearly negative. Although "if" commonly signifies a conditional sentence, there are many other words and constructs that can express conditions. This paper first presents a linguistic analysis of such sentences, and then builds some supervised learning models to determine if sentiments expressed on different topics in a conditional sentence are positive, negative or neutral. Experimental results on conditional sentences from 5 diverse domains are given to demonstrate the effectiveness of the proposed approach.
Abstract. We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several supervised classification methods were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. We have developed an on-line lung cancer outcome calculator for estimating the risk of mortality after 6 months, 9 months, 1 year, 2 year and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. Further, ensemble voting models were also created for predicting conditional survival outcome for lung cancer (estimating risk of mortality after 5 years of diagnosis, given that the patient has already survived for a period of time), and included in the calculator. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcomeCalculator/.
We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer using data mining techniques. Carefully designed preprocessing steps resulted in removal/ modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several data mining classification techniques were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. Further, we have developed an on-line lung cancer outcome calculator for estimating risk of mortality after 6 months, 9 months, 1 year, 2 year, and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. The on-line lung cancer outcome calculator developed as a result of this study is available at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.