Matthieu Defrance scite author profile

Background Single-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. Even though multiple scRNA-seq clustering techniques have been proposed, there is no consensus on the best performing approach. On a parallel research track, self-supervised contrastive learning recently achieved state-of-the-art results on images clustering and, subsequently, image classification. Results We propose contrastive-sc, a new unsupervised learning method for scRNA-seq data that perform cell clustering. The method consists of two consecutive phases: first, an artificial neural network learns an embedding for each cell through a representation training phase. The embedding is then clustered in the second phase with a general clustering algorithm (i.e. KMeans or Leiden community detection). The proposed representation training phase is a new adaptation of the self-supervised contrastive learning framework, initially proposed for image processing, to scRNA-seq data. contrastive-sc has been compared with ten state-of-the-art techniques. A broad experimental study has been conducted on both simulated and real-world datasets, assessing multiple external and internal clustering performance metrics (i.e. ARI, NMI, Silhouette, Calinski scores). Our experimental analysis shows that constastive-sc compares favorably with state-of-the-art methods on both simulated and real-world datasets. Conclusion On average, our method identifies well-defined clusters in close agreement with ground truth annotations. Our method is computationally efficient, being fast to train and having a limited memory footprint. contrastive-sc maintains good performance when only a fraction of input cells is provided and is robust to changes in hyperparameters or network architecture. The decoupling between the creation of the embedding and the clustering phase allows the flexibility to choose a suitable clustering algorithm (i.e. KMeans when the number of expected clusters is known, Leiden otherwise) or to integrate the embedding with other existing techniques.

show abstract

Molecular adaptations to heat stress in the thermophilic ant genus Cataglyphis

Perez

Araujo

Defrance

2021

Molecular Ecology

View full text Add to dashboard Cite

Over the last decade, increasing attention has been paid to the molecular adaptations used by organisms to cope with thermal stress. However, to date, few studies have focused on thermophilic species living in hot, arid climates. In this study, we explored molecular adaptations to heat stress in the thermophilic ant genus Cataglyphis, one of the world's most thermotolerant animal taxa. We compared heat tolerance and gene expression patterns across six Cataglyphis species from distinct phylogenetic groups that live in different habitats and experience different thermal regimes. We found that all six species had high heat tolerance levels with critical thermal maxima (CT max ) ranging from 43℃ to 45℃ and a median lethal temperature (LT50) ranging from 44.5℃ to 46.8℃. Transcriptome analyses revealed that, although the number of differentially expressed genes varied widely for the six species (from 54 to 1118), many were also shared. Functional annotation of the differentially expressed and co-expressed genes showed that the biological pathways involved in heat-shock responses were similar among species and were associated with four major processes: the regulation of transcriptional machinery and DNA metabolism; the preservation of proteome stability; the elimination of toxic residues; and the maintenance of cellular integrity. Overall, our results suggest that molecular responses to heat stress have been evolutionarily conserved in the ant genus Cataglyphis and that their diversity may help workers withstand temperatures close to their physiological limits.

show abstract

Distinct mesoderm migration phenotypes in extra-embryonic and embryonic regions of the early mouse embryo

Saykali

Mathiah

Nahaboo

et al. 2018

Preprint

View full text Add to dashboard Cite

In the gastrulating mouse embryo, epiblast cells delaminate at the primitive streak to form mesoderm and definitive endoderm, through an epithelial-mesenchymal transition.Mosaic expression of a membrane reporter in nascent mesoderm enabled recording cell shape and trajectory through live imaging. Upon leaving the streak, cells changed shape and extended protrusions of distinct size and abundance depending on the neighboring germ layer, as well as the region of the embryo. Embryonic trajectories were meandrous but directional, while extra-embryonic mesoderm cells showed little net displacement.Embryonic and extra-embryonic mesoderm transcriptomes highlighted distinct guidance, cytoskeleton, adhesion, and extracellular matrix signatures. Specifically, intermediate filaments were highly expressed in extra-embryonic mesoderm, while live imaging for F-actin showed abundance of actin filaments in embryonic mesoderm only. Accordingly, RhoA or Rac1 conditional deletion in mesoderm inhibited embryonic, but not extra-embryonic mesoderm migration.Overall, this indicates separate cytoskeleton regulation coordinating the morphology and migration of mesoderm subpopulations.

show abstract

Abstract LB-180: Epigenetic portraits of human breast cancers

Dedeurwaerder

Desmedt

Calonne

et al. 2011

View full text Add to dashboard Cite

Background: Understanding the diversity of breast cancer is essential to improving diagnosis and optimising treatment. Both genetic and acquired epigenetic abnormalities participate in cancer, but information is scant on the involvement of the epigenome in breast cancer and its contribution to the complexity of the disease. Our goal was to explore the DNA methylation landscapes of phenotypically heterogeneous tumours, to relate this diversity to landscape features, and extract biological and clinical meaningful information. Methods: We performed comprehensive DNA methylation profiling to assess the methylomes of two independent sets of frozen breast tissue samples: a “main set” of 123 samples (4 normal and 119 infiltrating ductal carcinomas, IDCs), and a “validation set” of 125 samples (8 normal and 117 IDCs). We used the recently developed Illumina's Infinium Methylation Assay, that allows to assess the methylation status of more than 27,000 CpGs corresponding to over 14,000 genes. Results: Firstly, it emerged that the two major phenotypes of breast cancers determined by ER status are widely epigenetically controlled. Secondly, we have distinguished, and validated in an independent set of tumours, 6 methylation-profile-based tumour groups, some coinciding with known “expression subtypes” but also new entities that may provide a meaningful basis for refining breast tumour taxonomy. Thirdly, we showed that DNA methylation profiling can reflect the cell type composition of the tumour microenvironment. Lastly, we highlighted an unexpectedly strong epigenetic component in the regulation of key immune pathways, revealing a set of immune genes having high prognostic value in specific tumour categories. Conclusions: In this study, we have generated the largest and most comprehensive DNA methylation data set for human breast tumor tissues. Several novel findings and original concepts for breast cancer emerge, that previous RNA expression profiling has not highlighted. By laying the ground for better understanding of breast cancer heterogeneity and improved tumor taxonomy, the precise epigenetic portraits drawn in our work should contribute to better management of breast cancer patients. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr LB-180. doi:10.1158/1538-7445.AM2011-LB-180

show abstract

Improving Infinium MethylationEPIC data processing: re-annotation of enhancers and long noncoding RNA genes and benchmarking of normalization methods

et al. 2022

View full text Add to dashboard Cite

Illumina Infinium DNA Methylation (5mC) arrays are a popular technology for low-cost, high-throughput, genome-scale measurement of 5mC distribution, especially in cancer and other complex diseases. After the success of its HumanMethylation450 array (450k), Illumina released the MethylationEPIC array (850k) featuring increased coverage of enhancers. Despite the widespread use of 850k, analysis of the corresponding data remains suboptimal: it still relies mostly on Illumina’s default annotation, which underestimates enhancerss and long noncoding RNAs. Results: We have thus developed an approach, based on the ENCODE and LNCipedia databases, which greatly improves upon Illumina’s default annotation of enhancers and long noncoding transcripts. We compared the re-annotated 850k with both 450k and reduced-representation bisulphite sequencing (RRBS), another high-throughput 5mC profiling technology. We found 850k to cover at least three times as many enhancers and long noncoding RNAs as either 450k or RRBS. We further investigated the reproducibility of the three technologies, applying various normalization methods to the 850k data. Most of these methods reduced variability to a level below that of RRBS data. We then used 850k with our new annotation and normalization to profile 5mC changes in breast cancer biopsies. 850k highlighted aberrant enhancer methylation as the predominant feature, in agreement with previous reports. Our study provides an updated processing approach for 850k data, based on refined probe annotation and normalization, allowing for improved analysis of methylation at enhancers and long noncoding RNA genes. Our findings will help to further advance understanding of the DNA methylome in health and disease.

show abstract

Identification of differentially methylated regions in rare diseases from a single-patient perspective

et al. 2022

View full text Add to dashboard Cite

Background DNA methylation (5-mC) is being widely recognized as an alternative in the detection of sequence variants in the diagnosis of some rare neurodevelopmental and imprinting disorders. Identification of alterations in DNA methylation plays an important role in the diagnosis and understanding of the etiology of those disorders. Canonical pipelines for the detection of differentially methylated regions (DMRs) usually rely on inter-group (e.g., case versus control) comparisons. However, these tools might perform suboptimally in the context of rare diseases and multilocus imprinting disturbances due to small cohort sizes and inter-patient heterogeneity. Therefore, there is a need to provide a simple but statistically robust pipeline for scientists and clinicians to perform differential methylation analyses at the single patient level as well as to evaluate how parameter fine-tuning may affect differentially methylated region detection. Result We implemented an improved statistical method to detect differentially methylated regions in correlated datasets based on the Z-score and empirical Brown aggregation methods from a single-patient perspective. To accurately assess the predictive power of our method, we generated semi-simulated data using a public control population of 521 samples and investigated how the size of the control population, methylation difference, and region size affect DMR detection. In addition, we validated the detection of methylation events in patients suffering from rare multi-locus imprinting disturbance and evaluated how this method could complement existing tools in the context of clinical diagnosis. Conclusion In this study, we present a robust statistical method to perform differential methylation analysis at the single patient level and describe its optimal parameters to increase DMRs identification performance. Finally, we show its diagnostic utility when applied to rare disorders.

show abstract

Explainability methods for differential gene analysis of single cell RNA-seq clustering models

Ciortan

Defrance

2021

Preprint

View full text Add to dashboard Cite

Single-cell RNA sequencing (scRNA-seq) produces transcriptomic profiling for individual cells. Due to the lack of cell-class annotations, scRNA-seq is routinely analyzed with unsupervised clustering methods. Because these methods are typically limited to producing clustering predictions (that is, assignment of cells to clusters of similar cells), numerous model agnostic differential expression (DE) libraries have been proposed to identify the genes expressed differently in the detected clusters, as needed in the downstream analysis. In parallel, the advancements in neural networks (NN) brought several model-specific explainability methods to identify salient features based on gradients, eliminating the need for external models.We propose a comprehensive study to compare the performance of dedicated DE methods, with that of explainability methods typically used in machine learning, both model agnostic (such as SHAP, permutation importance) and model-specific (such as NN gradient-based methods). The DE analysis is performed on the results of 3 state-of-the-art clustering methods based on NNs. Our results on 36 simulated datasets indicate that all analyzed DE methods have limited agreement between them and with ground-truth genes. The gradients method outperforms the traditional DE methods, which en-courages the development of NN-based clustering methods to provide an out-of-the-box DE capability. Employing DE methods on the input data preprocessed by clustering method outperforms the traditional approach of using the original count data, albeit still performing worse than gradient-based methods.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.