Abstract:With the rapid development of next generation sequencing technology, the amount of biological sequence data of the cancer genome increases exponentially, which calls for efficient and effective algorithms that may identify patterns hidden underneath the raw data that may distinguish cancer Achilles' heels. From a signal processing point of view, biological units of information, including DNA and protein sequences, have been viewed as one-dimensional signals. Therefore, researchers have been applying signal pro… Show more
“…Spectral approaches enable the discovery of enough "fuzzy" periodicity in protein sequences without insertion(s) or deletion(s) of amino acids. Fourier transform, wavelet transform, information decomposition and some other methods can be attributed to a number of spectral methods (Tiwari et al, 1997;Lobzin and Chechetkin, 2000;Kravatskaya et al, 2011;Korotkov et al, 2003a;de Sousa Vieira, 1999;Meng et al, 2013;Suvorova et al, 2014;Sosa et al, 2013;Kumar et al, 2006). However, these approaches have a significant limitation, such as the fact that they do not allow the detection of periodicity with insertions and deletions.…”
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming, and random weight matrices were used to develop the new mathematical algorithm for latent periodicity search. The method makes the direct optimization of the position-weight matrix for multiple sequence alignment without using pairwise alignments. The developed algorithm was applied to analyze the amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.
“…Spectral approaches enable the discovery of enough "fuzzy" periodicity in protein sequences without insertion(s) or deletion(s) of amino acids. Fourier transform, wavelet transform, information decomposition and some other methods can be attributed to a number of spectral methods (Tiwari et al, 1997;Lobzin and Chechetkin, 2000;Kravatskaya et al, 2011;Korotkov et al, 2003a;de Sousa Vieira, 1999;Meng et al, 2013;Suvorova et al, 2014;Sosa et al, 2013;Kumar et al, 2006). However, these approaches have a significant limitation, such as the fact that they do not allow the detection of periodicity with insertions and deletions.…”
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming, and random weight matrices were used to develop the new mathematical algorithm for latent periodicity search. The method makes the direct optimization of the position-weight matrix for multiple sequence alignment without using pairwise alignments. The developed algorithm was applied to analyze the amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.
“…The amino acid mutation sample is represented by a numerical sequence according to the mapping scheme defined in Table 1. Wavelet analysis is then applied to generate the wavelets features (Meng et al, 2013). The Matlab wavelet toolbox was used to perform wavelet analysis, where a continuous wavelet transform based on Gaussian wavelets function is used to extract wavelet coefficients.…”
Section: Features Extraction and Selectionmentioning
Driver mutations propel oncogenesis and occur much less frequently than passenger mutations. The need for automatic and accurate identification of driver mutations has increased dramatically with the exponential growth of mutation data. Current computational solutions to identify driver mutations rely on sequence homology. Here we construct a machine learning-based framework that does not rely on sequence homology or domain knowledge to predict driver missense mutations. A windowing approach to represent the local environment of the sequence around the mutation point as a mutation sample is applied, followed by extraction of three sequence-level features from each sample. After selecting the most significant features, the support vector machine and multimodal fusion strategies are employed to give final predictions. The proposed framework achieves relatively high performance and outperforms current state-of-the-art algorithms. The ease of deploying the proposed framework and the relatively accurate performance make this solution applicable to large-scale mutation data analyses.
“…The modern applications of the continuous wavelet transform are focused, in particular, on a study of environmental time series [3,4], geo-and astrophysics [5,6,7], biophysics [8,9,10] and neuroscience, see an extensive review in the recently published book [11].…”
Recently, it has been proven [R. Soc. Open Sci. 1 (2014) 140124] that the continuous wavelet transform with non-admissible kernels (approximate wavelets) allows for an existence of the exact inverse transform. Here we consider the computational possibility for the realization of this approach. We provide modified simpler explanation of the reconstruction formula, restricted on the practical case of real valued finite (or periodic/periodized) samples and the standard (restricted) Morlet wavelet as a practically important example of an approximate wavelet. The provided examples of applications includes the test function and the non-stationary electro-physical signals arising in the problem of neuroscience.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.