Matthieu Jung scite author profile

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3.

show abstract

Blepharocheilodontic syndrome is a CDH1 pathway–related disorder due to mutations in CDH1 and CTNND1

Ghoumid

Stichelbout

Jourdain

et al. 2017

Genetics in Medicine

View full text Add to dashboard Cite

Mutations in CDH1 encoding the E-cadherin were previously reported in hereditary diffuse gastric cancer as well as in nonsyndromic cleft lip/palate. Mutations in CTNND1 have never been reported before. The encoded protein, p120ctn, prevents E-cadherin endocytosis and stabilizes its localization at the cell surface. Conditional deletion of Cdh1 and Ctnnd1 in various animal models induces features reminiscent of BCD syndrome and underlines critical role of the E-cadherin-p120ctn interaction in eyelid, craniofacial, and tooth development. Our data assert BCD syndrome as a CDH1 pathway-related disorder due to mutations in CDH1 and CTNND1 and widen the phenotypic spectrum of E-cadherin anomalies.Genet Med advance online publication 09 March 2017.

show abstract

Cell cycle gene regulation dynamics revealed by RNA velocity and deep-learning

et al. 2022

View full text Add to dashboard Cite

Despite the fact that the cell cycle is a fundamental process of life, a detailed quantitative understanding of gene regulation dynamics throughout the cell cycle is far from complete. Single-cell RNA-sequencing (scRNA-seq) technology gives access to these dynamics without externally perturbing the cell. Here, by generating scRNA-seq libraries in different cell systems, we observe cycling patterns in the unspliced-spliced RNA space of cell cycle-related genes. Since existing methods to analyze scRNA-seq are not efficient to measure cycling gene dynamics, we propose a deep learning approach (DeepCycle) to fit these patterns and build a high-resolution map of the entire cell cycle transcriptome. Characterizing the cell cycle in embryonic and somatic cells, we identify major waves of transcription during the G1 phase and systematically study the stages of the cell cycle. Our work will facilitate the study of the cell cycle in multiple cellular models and different biological contexts.

show abstract

The origin and molecular epidemiology of HIV

Peeters

Jung

Ayouba

2013

Expert Review of Anti-infective Therapy

View full text Add to dashboard Cite

HIV-1 in humans resulted from at least four cross-species transmissions of simian immunodeficiency viruses (SIVs) from chimpanzees and gorillas in West Central Africa, while HIV-2 viruses resulted from at least eight independent transmissions of SIVs infecting sooty mangabeys in West Africa only, where one of these transmissions (HIV-1 group M) is responsible for the global epidemic. HIV-1 M is subdivided into nine subtypes and a wide diversity of circulating recombinant forms (CRFs) and unique recombinant forms. The heterogenic HIV-1 M subtype/CRF distribution is the result of founder effects. The genetic diversity of HIV-1 continues to increase overtime due to demographic factors such as travel and migration and frequent co/superinfections. In addition, the expanded access to antiretrovirals leads to an increasing number of drug-resistant strains, especially in resource limited countries.

show abstract

Arabidopsis ATRX Modulates H3.3 Occupancy and Fine-Tunes Gene Expression

et al. 2017

View full text Add to dashboard Cite

Histones are essential components of the nucleosome, the major chromatin subunit that structures linear DNA molecules and regulates access of other proteins to DNA. Specific histone chaperone complexes control the correct deposition of canonical histones and their variants to modulate nucleosome structure and stability. In this study, we characterize the Alpha Thalassemia-mental Retardation X-linked (ATRX) ortholog and show that ATRX is involved in histone H3 deposition. Arabidopsis ATRX mutant alleles are viable, but show developmental defects and reduced fertility. Their combination with mutants of the histone H3.3 chaperone HIRA (Histone Regulator A) results in impaired plant survival, suggesting that HIRA and ATRX function in complementary histone deposition pathways. Indeed, ATRX loss of function alters cellular histone H3.3 pools and in consequence modulates the H3.1/H3.3 balance in the cell. H3.3 levels are affected especially at genes characterized by elevated H3.3 occupancy, including the 45S ribosomal DNA (45S rDNA) loci, where loss of ATRX results in altered expression of specific 45S rDNA sequence variants. At the genome-wide scale, our data indicate that ATRX modifies gene expression concomitantly to H3.3 deposition at a set of genes characterized both by elevated H3.3 occupancy and high expression. Together, our results show that ATRX is involved in H3.3 deposition and emphasize the role of histone chaperones in adjusting genome expression.

show abstract

Intragenic FMR1 disease-causing variants: a significant mutational mechanism leading to Fragile-X syndrome

Quartier¹,

Poquet²,

Gilbert‐Dussardier³

et al. 2017

Eur J Hum Genet

View full text Add to dashboard Cite

Fragile-X syndrome (FXS) is a frequent genetic form of intellectual disability (ID). The main recurrent mutagenic mechanism causing FXS is the expansion of a CGG repeat sequence in the 5′-UTR of the FMR1 gene, therefore, routinely tested in ID patients. We report here three FMR1 intragenic pathogenic variants not affecting this sequence, identified using high-throughput sequencing (HTS): a previously reported hemizygous deletion encompassing the last exon of FMR1, too small to be detected by array-CGH and inducing decreased expression of a truncated form of FMRP protein, in three brothers with ID (family 1) and two splice variants in boys with sporadic ID: a de novo variant c.990+1G4A (family 2) and a maternally inherited c.420-8A4G variant (family 3). After clinical reevaluation, the five patients presented features consistent with FXS (mean Hagerman's scores = 15). We conducted a systematic review of all rare non-synonymous variants previously reported in FMR1 in ID patients and showed that six of them are convincing pathogenic variants. This study suggests that intragenic FMR1 variants, although much less frequent than CGG expansions, are a significant mutational mechanism leading to FXS and demonstrates the interest of HTS approaches to detect them in ID patients with a negative standard work-up.

show abstract

Searching for virus phylotypes

Chevenet

Jung

Peeters

et al. 2013

View full text Add to dashboard Cite

Motivation: Large phylogenies are being built today to study virus evolution, trace the origin of epidemics, establish the mode of transmission and survey the appearance of drug resistance. However, no tool is available to quickly inspect these phylogenies and combine them with extrinsic traits (e.g. geographic location, risk group, presence of a given resistance mutation), seeking to extract strain groups of specific interest or requiring surveillance.Results: We propose a new method for obtaining such groups, which we call phylotypes, from a phylogeny having taxa (strains) annotated with extrinsic traits. Phylotypes are subsets of taxa with close phylogenetic relationships and common trait values. The method combines ancestral trait reconstruction using parsimony, with combinatorial and numerical criteria measuring tree shape characteristics and the diversity and separation of the potential phylotypes. A shuffling procedure is used to assess the statistical significance of phylotypes. All algorithms have linear time complexity. This results in low computing times, typically a few minutes for the larger data sets with a number of shuffling steps. Two HIV-1 data sets are analyzed, one of which is large, containing >3000 strains of HIV-1 subtype C collected worldwide, where the method shows its ability to recover known clusters and transmission routes, and to detect new ones.Availability: This method and companion tools are implemented in an interactive Web interface (www.phylotype.org), which provides a wide choice of graphical views and output formats, and allows for exploratory analyses of large data sets.Contact: francois.chevenet@ird.fr, gascuel@lirmm.frSupplementary information: Supplementary data are available at Bioinformatics online.

show abstract

The Origin and Evolutionary History of HIV-1 Subtype C in Senegal

et al. 2012

View full text Add to dashboard Cite

Background The classification of HIV-1 strains in subtypes and Circulating Recombinant Forms (CRFs) has helped in tracking the course of the HIV pandemic. In Senegal, which is located at the tip of West Africa, CRF02_AG predominates in the general population and Female Sex Workers (FSWs). In contrast, 40% of Men having Sex with Men (MSM) in Senegal are infected with subtype C. In this study we analyzed the geographical origins and introduction dates of HIV-1 C in Senegal in order to better understand the evolutionary history of this subtype, which predominates today in the MSM population Methodology/Principal Findings We used a combination of phylogenetic analyses and a Bayesian coalescent-based approach, to study the phylogenetic relationships in pol of 56 subtype C isolates from Senegal with 3,025 subtype C strains that were sampled worldwide. Our analysis shows a significantly well supported cluster which contains all subtype C strains that circulate among MSM in Senegal. The MSM cluster and other strains from Senegal are widely dispersed among the different subclusters of African HIV-1 C strains, suggesting multiple introductions of subtype C in Senegal from many different southern and east African countries. More detailed analyses show that HIV-1 C strains from MSM are more closely related to those from southern Africa. The estimated date of the MRCA of subtype C in the MSM population in Senegal is estimated to be in the early 80's. Conclusions/Significance Our evolutionary reconstructions suggest that multiple subtype C viruses with a common ancestor originating in the early 1970s entered Senegal. There was only one efficient spread in the MSM population, which most likely resulted from a single introduction, underlining the importance of high-risk behavior in spread of viruses.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Matthieu Jung

Fast Dating Using Least-Squares Criteria and Algorithms

Blepharocheilodontic syndrome is a CDH1 pathway–related disorder due to mutations in CDH1 and CTNND1

Cell cycle gene regulation dynamics revealed by RNA velocity and deep-learning

The origin and molecular epidemiology of HIV

Arabidopsis ATRX Modulates H3.3 Occupancy and Fine-Tunes Gene Expression

Intragenic FMR1 disease-causing variants: a significant mutational mechanism leading to Fragile-X syndrome

Searching for virus phylotypes

The Origin and Evolutionary History of HIV-1 Subtype C in Senegal

Contact Info

Product

Resources

About