In most cases, the application of machine learning techniques to biological sequence data requires a vector representation of the sequences. Extracting the numerical features from sequence data can be time consuming, especially if the user lacks programming skills. To this end, we propose a Weka package called WeSeqMiner, which provides several useful filters for extracting numerical features from sequence data for use in the Weka machine learning workbench. Motivated with an example, we show that the WeSeqMiner package integrates well with the Weka API, allowing transformations to be incorporated into Weka workflows for predictive model generation. WeSeqMiner can be installed by pointing the Weka package manager to the URL github.com/djhogan/WeSeqMiner/raw/master/WeSeqMiner.zip. The Javadoc for WeSeqMiner classes can be accessed at djhogan.github.io/seqminer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.