Advait Balaji scite author profile

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.

show abstract

Multiple genome alignment in the telomere-to-telomere assembly era

Kille

Balaji

Sedlazeck

et al. 2022

Genome Biol

View full text Add to dashboard Cite

With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.

show abstract

SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

et al. 2022

View full text Add to dashboard Cite

The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen.

show abstract

To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics

Elworth

Wang²,

Kota³

et al. 2020

View full text Add to dashboard Cite

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.

show abstract

EEG-based classification of bilingual unspoken speech using ANN

Balaji

Haldar

Patil

et al. 2017

View full text Add to dashboard Cite

The ability to interpret unspoken or imagined speech through electroencephalography (EEG) is of therapeutic interest for people suffering from speech disorders and `lockedin' syndrome. It is also useful for brain-computer interface (BCI) techniques not involving articulatory actions. Previous work has involved using particular words in one chosen language and training classifiers to distinguish between them. Such studies have reported accuracies of 40-60% and are not ideal for practical implementation. Furthermore, in today's multilingual society, classifiers trained in one language alone might not always have the desired effect. To address this, we present a novel approach to improve accuracy of the current model by combining bilingual interpretation and decision making. We collect data from 5 subjects with Hindi and English as primary and secondary languages respectively and ask them 20 `Yes'/`No' questions (`Haan'/`Na' in Hindi) in each language. We choose sensors present in regions important to both language processing and decision making. Data is preprocessed, and Principal Component Analysis (PCA) is carried out to reduce dimensionality. This is input to Support Vector Machine (SVM), Random Forest (RF), AdaBoost (AB), and Artificial Neural Networks (ANN) classifiers for prediction. Experimental results reveal best accuracy of 85.20% and 92.18% for decision and language classification respectively using ANN. Overall accuracy of bilingual speech classification is 75.38%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Advait Balaji

Current progress and open challenges for applying deep learning across the biosciences

Multiple genome alignment in the telomere-to-telomere assembly era

SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics

EEG-based classification of bilingual unspoken speech using ANN

Contact Info

Product

Resources

About