Summary: Affinity propagation (AP) clustering has recently gained increasing popularity in bioinformatics. AP clustering has the advantage that it allows for determining typical cluster members, the so-called exemplars. We provide an R implementation of this promising new clustering technique to account for the ubiquity of R in bioinformatics. This article introduces the package and presents an application from structural biology. Availability: The R package apcluster is available via CRAN-The Comprehensive R Archive Network: http://cran.r-project.org/web/packages/apcluster
Support vector machines (SVMs) are well-established standard methods for classifying biological sequences. Advantages of SVMs [2,8]:• Maximizing the margin between two classes → proven to be a near-optimal learning strategy.• Optimization problem is convex and quadratic → global solution exists and can be found efficiently.• Only depend on very few hyperparameters → easier model selection.• Can be applied to any kind of data; all needed is a meaningful positive semi-definite comparison measure (the so-called kernel) → great advantage for sequences (cannot always be cast into vectorial data)SVMs in a Nutshell. Consider training data {(x i , y i ) | i =1,…,l}, where x i are sequences and y i ∈ {-1,+1} are binary labels. Discriminant function of SVM:x: new data item to be classified; α i : weights determined by SVM training (Lagrange multipliers); k(.,.): kernel function.Sequence Kernels. Wide range available [9], many of which can be expressed as [1] P: set of sequence patterns; N(p,x): number of occurrences/matches of pattern p in sequence x. This formulation includes the well-known spectrum kernel [6], the mismatch kernel [5], and the spatial sample kernel [4]. To correct for varying sequence lengths, it is often useful to normalize the kernel [9]:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.