Traditional search engines for life sciences (e.g., PubMed) are designed for document retrieval and do not allow direct retrieval of specific statements. Some of these statements may serve as textual evidence that is key to tasks such as hypothesis generation and new finding validation. We present EVIDENCEM-INER, a web-based system that lets users query a natural language statement and automatically retrieves textual evidence from a background corpora for life sciences. EVIDENCEMINER is constructed in a completely automated way without any human effort for training data annotation. It is supported by novel data-driven methods for distantly supervised named entity recognition and open information extraction. The entities and patterns are pre-computed and indexed offline to support fast online evidence retrieval. The annotation results are also highlighted in the original document for better visualization. EVIDENCEMINER also includes analytic functionalities such as the most frequent entity and relation summarization. EVIDENCEMINER can help scientists uncover essential research issues, leading to more effective research and more indepth quantitative analysis. The system of EVIDENCEMINER is available at https:// evidenceminer.firebaseapp.com/ 1 .
In creating a pattern classifier, feature selection is often used to prune irrelevant and noisy features to producing effective features. Manually developing a feature set can be a very time consuming and costly endeavor. In this paper, an efficient feature selection algorithm based on improved binary particle swarm optimization and support vector machine Algorithm (IBPSO-SVM) was used. First a population of particles (feature subsets) were randomly generated, and then optimized by IBPSO-SVM wrapper algorithms; finally the best fitness feature subset was applied to SVM classification. The simulation experiment results have proved that the feature subset selection algorithm based on IBPSO-SVM is very effective.
Genetic Programming (GP) is a popular and powerful evolutionary optimization algorithm that has a wide range of applications such as symbolic regression, classification and program synthesis. However, existing GPs often ignore the intrinsic structure of the ground truth equation of the symbolic regression problem. To improve the search efficacy of GP on symbolic regression problems by fully exploiting the intrinsic structure information, this paper proposes a genetic programming with separability detection technique (SD-GP). In the proposed SD-GP, a separability detection method is proposed to detect additive separable characteristics of input features from the observed data. Then based on the separability detection results, a chromosome representation is proposed, which utilizes multiple sub chromosomes to represent the final solution. Some sub chromosomes are used to construct separable sub functions by using separate input features, while the other sub chromosomes are used to construct sub functions by using all input features. The final solution is the weighted sum of all sub functions, and the optimal weights of sub functions are obtained by using the least squares method. In this way, the structure information can be learnt and the global search ability of GP can be maintained. Experimental results on synthetic problems with differing characteristics have demonstrated that the proposed SD-GP can perform better than several state-of-the-art GPs in terms of the success rate of finding the optimal solution and the convergence speed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.