Abstract. The naive Bayes classifier, currently experiencing a renaissance in machine learning, has long been a core technique in information retrieval. We review some of the variations of naive Bayes models used for text retrieval and classification, focusing on the distributional assumptious made about word occurrences in documents.
The ability to cheaply train text classi ers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classi ers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classi ed to achieve a given level of e ectiveness.
In this wide‐ranging paper, Lewis defends the view that propositional attitudes consist in relations to properties, which themselves are sets of possible individuals. In so doing, he champions the importance of self‐ascribing attitudes (i.e. what he coins ‘de se’ attitudes), arguing that “the de se subsumes the de dicto, but not vice versa.” Along the way, a host of topics are discussed, including time‐slices of continuant persons, centered possible worlds, and decision theory.
Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.
Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suited for an application may be too expensive to train or use during the selection of instances. We test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program). Despite being chosen by this heterogeneous approach, the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger.
Drawing on his account of possible worlds, Lewis attempts to specify the truth‐conditions for statements of the form, ‘In fiction f, Φ’. According to Lewis, such truth in fiction is the product of two sources: “the explicit content of the fiction, and a background consisting either of the facts about our world or of the beliefs overt in the community of origin.” In the postscript, Lewis addresses the topics of make‐believe, impossible fictions, and fiction in the service of truth.
Formulates a principle about reasonable subjective probabilities conditional on propositions about objective probability in the single case. Consequences of the principle for objective probabilities and for the relation between objective and subjective probabilities are then considered. The paper contains four postscripts, dealing with objections and further issues arising from the principle.
Lewis defends an account of the role of theoretical terms in scientific theories. Drawing on the work of Ramsey and Carnap, Lewis advocates the view that theoretical terms are implicitly defined by the scientific theories in which they figure; their meanings are to be characterized in functional terms, by reference to causal roles. According to Lewis, this understanding of theoretical terms (which would become influential in the development of functionalist theories of the mind) enables us to understand how one scientific theory may be reduced to another.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.