This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other * Corresponding author.TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks.
Segmentation through seeded region growing is widely used because it is fast, robust and free of tuning parameters. However, the seeded region growing algorithm requires an automatic seed generator, and has problems to label unconnected pixels (the unconnected pixel problem). This paper introduces a new automatic seeded region growing algorithm called ASRG-IB1 that performs the segmentation of color (RGB) and multispectral images. The seeds are automatically generated via histogram analysis; the histogram of each band is analyzed to obtain intervals of representative pixel values. An image pixel is considered a seed if its gray values for each band fall in some representative interval. After that, our new seeded region growing algorithm is applied to segment the image. This algorithm uses instance-based learning as distance criteria. Finally, according to the user needs, the regions are merged using ownership tables. The algorithm was tested on several leukemia medical images showing good results.
Reinforcement learning deals with learning optimal or near optimal policies while interacting with the environment. Application domains with many continuous variables are difficult to solve with existing reinforcement learning methods due to the large search space. In this paper, we use a relational representation to define powerful abstractions that allow us to incorporate domain knowledge and re-use previously learned policies in other similar problems. We also describe how to learn useful actions from human traces using a behavioural cloning approach combined with an exploration phase. Since several conflicting actions may be induced for the same abstract state, reinforcement learning is used to learn an optimal policy over this reduced space. It is shown experimentally how a combination of behavioural cloning and reinforcement learning using a relational representation is powerful enough to learn how to fly an aircraft through different points in space and different turbulence conditions.
Research progress in AutoML has lead to state of the art solutions that can cope quite well with supervised learning task, e.g., classification with AutoSklearn. However, so far these systems do not take into account the changing nature of evolving data over time (i.e., they still assume i.i.d. data); even when this sort of domains are increasingly available in real applications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to develop an AutoML solution for scenarios in which data distribution changes relatively slowly over time and in which the problem is approached in a lifelong learning setting. We extend Auto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort of problems. The extended Auto-Sklearn is combined with concept drift detection techniques that allow it to automatically determine when the initial models have to be adapted. We report experimental results in benchmark data from AutoML competitions that adhere to this scenario. Results demonstrate the effectiveness of the proposed methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.