Elya Wygoda scite author profile

Elya Wygoda

5Publications

42Citation Statements Received

245Citation Statements Given

How they've been cited

How they cite others

233

243

Affiliations

Tel Aviv University

Publications

Order By: Most citations

A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions

Loewenthal

Rapoport

Avram

et al. 2021

View full text Add to dashboard Cite

Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here we introduce several improvements to indel modeling: (1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; (2) We introduce numerous summary statistics that allow Approximate Bayesian Computation (ABC) based parameter estimation; (3) We develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical datasets; (4) Using a model-selection scheme we test whether the richer model better fits biological data compared to the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical datasets and that, for the majority of these datasets, the deletion rate is higher than the insertion rate.

show abstract

Harnessing machine translation methods for sequence alignment

Dotan

Belinkov

Avram

et al. 2022

Preprint

View full text Add to dashboard Cite

The sequence alignment problem is one of the most fundamental problems in bioinformatics and a plethora of methods were devised to tackle it. Here we introduce BetaAlign, a novel methodology for aligning sequences using a natural language processing (NLP) approach. BetaAlign accounts for the possible variability of the evolutionary process among different datasets by using an ensemble of transformers, each trained on millions of samples generated from a different evolutionary model. Our approach leads to outstanding alignment accuracy, often outperforming commonly used methods, such as MAFFT, DIALIGN, ClustalW, T-Coffee, and MUSCLE. Notably, the utilization of deep-learning techniques for the sequence alignment problem brings additional advantages, such as automatic feature extraction that can be leveraged for a variety of downstream analysis tasks.

show abstract

An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements

Moshe

Wygoda

Ecker

et al. 2022

View full text Add to dashboard Cite

The inference of genome rearrangement events has been extensively studied, as they play a major role in molecular evolution. However, probabilistic evolutionary models that explicitly imitate the evolutionary dynamics of such events, as well as methods to infer model parameters, are yet to be fully utilized. Here, we developed a probabilistic approach to infer genome rearrangement rate parameters using an Approximate Bayesian Computation (ABC) framework. We developed two genome rearrangement models, a basic model, which accounts for genomic changes in gene order, and a more sophisticated one which also accounts for changes in chromosome number. We characterized the ABC inference accuracy using simulations and applied our methodology to both prokaryotic and eukaryotic empirical datasets. Knowledge of genome-rearrangement rates can help elucidate their role in evolution as well as help simulate genomes with evolutionary dynamics that reflect empirical genomes.

show abstract

The evolutionary dynamics that retain long neutral genomic sequences in face of indel deletion bias: a model and its application to human introns

et al. 2022

View full text Add to dashboard Cite

Insertions and deletions (indels) of short DNA segments are common evolutionary events. Numerous studies showed that deletions occur more often than insertions in both prokaryotes and eukaryotes. It raises the question why neutral sequences are not eradicated from the genome. We suggest that this is due to a phenomenon we term border-induced selection . Accordingly, a neutral sequence is bordered between conserved regions. Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection. Thus, for short neutral sequences, an insertion bias is expected. Here, we develop a set of increasingly complex models of indel dynamics that incorporate border-induced selection. Furthermore, we show that short conserved sequences within the neutrally evolving sequence help explain: (i) the presence of very long sequences; (ii) the high variance of sequence lengths; and (iii) the possible emergence of multimodality in sequence length distributions. Finally, we fitted our models to the human intron length distribution, as introns are thought to be mostly neutral and bordered by conserved exons. We show that when accounting for the occurrence of short conserved sequences within introns, we reproduce the main features, including the presence of long introns and the multimodality of intron distribution.

show abstract

GenomeFLTR: filtering reads made easy

Dotan

Alburquerque

Wygoda

et al. 2023

View full text Add to dashboard Cite

In the last decade, advances in sequencing technology have led to an exponential increase in genomic data. These new data have dramatically changed our understanding of the evolution and function of genes and genomes. Despite improvements in sequencing technologies, identifying contaminated reads remains a complex task for many research groups. Here, we introduce GenomeFLTR, a new web server to filter contaminated reads. Reads are compared against existing sequence databases from various representative organisms to detect potential contaminants. The main features implemented in GenomeFLTR are: (i) automated updating of the relevant databases; (ii) fast comparison of each read against the database; (iii) the ability to create user-specified databases; (iv) a user-friendly interactive dashboard to investigate the origin and frequency of the contaminations; (v) the generation of a contamination-free file. Availability: https://genomefltr.tau.ac.il/.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Elya Wygoda

A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions

Harnessing machine translation methods for sequence alignment

An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements

The evolutionary dynamics that retain long neutral genomic sequences in face of indel deletion bias: a model and its application to human introns

GenomeFLTR: filtering reads made easy

Contact Info

Product

Resources

About