A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Gambardella, Gennaro; Bernardo, Diego di

doi:10.3389/fgene.2019.00734

Cited by 16 publications

(36 citation statements)

References 28 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The two case studies aimed at the extraction of chemical names from the texts relevant to HIV reverse transcriptase inhibition, proteins and genes from the texts relevant to HIV control allow us to determine the advantages and disadvantages of text mining approaches to new information. The main advantage of text mining approaches is the possibility of covering the huge amount of textual data (Ruusmann and Maran, 2013 ; Capuzzi et al, 2017 , 2018 ; Kandhro et al, 2017 ; Azam et al, 2019 ; Gambardella and di Bernardo, 2019 ; Guin et al, 2019 ; Ivanisenko et al, 2019 ; Alves et al, 2020 ). Text mining approaches allow retrieving the most recent and important information about chemicals, proteins, and genes associated with HIV treatment including their tissue-specific expression level (Ivanisenko et al, 2019 ).…”

Section: Discussionmentioning

confidence: 99%

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

et al. 2020

View full text Add to dashboard Cite

Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

show abstract

Section: Discussionmentioning

confidence: 99%

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

et al. 2020

View full text Add to dashboard Cite

show abstract

“…We next generated an atlas encompassing all 32 cell-lines, as shown in Figure 1A and available online. This was achieved by combining data across cell-lines with the gf-icf pipeline ( 19 ), which performs count normalization, feature selection and dimensionality reduction ( 20 ) of the profiled cells. In the atlas, cell lines derived from the same cancer subtypes tend to cluster together, while being separated from the other subtypes (Figure 1A): luminal BC cell lines form a big “island” with multiple “peninsulas” with intermixing of cells from distinct cell lines (Figure 1A,B); on the contrary, TNBC cells give rise to an “archipelago” with cells from the same cell-line grouped into distinct islands, thus suggesting that TNBC cell-lines represent instances of distinct diseases.…”

Section: Resultsmentioning

confidence: 99%

“…Single cells expression profiles were normalized using GF-ICF (Gene Frequency – Inverse Cell Frequency) normalization using the gficf package 65,66 for R statistical environment (https://github.com/dibbelab/gficf). GF-ICF is based on a data transformation model called term frequency-inverse document frequency (TF-IDF) that has been extensively used in the field of text mining.…”

Section: Methodsmentioning

confidence: 99%

“…Cell clustering and identification of marker genes: Transcriptionally similar subpopulations of cells sux were found using a Phenograph like approach 67 as implemented in the clustcells function of gficf package 64 .…”

Section: Suwmentioning

confidence: 99%

“…The Louvain algorithm with resolution svs parameter equal to 0.25 was used to find communities of cells in this graph. Differentially expressed genes svt in each cluster were identified by the findClusterMarkers function of gficf package, which compares the svu expression of a gene in each cluster versus all the other by using the Wilcoxon rank-sum test 64 .…”

Section: Suwmentioning

confidence: 99%

See 2 more Smart Citations

A single-cell atlas of breast cancer cell lines to study tumour heterogeneity and drug response

Gambardella

Viscido

Tumaini³

et al. 2021

Preprint

View full text Add to dashboard Cite

Brest Cancer (BC) patient stratification is mainly driven by receptor status and histological grading and subtyping, with about twenty percent of patients for which absence of any actionable biomarkers results in no clear therapeutic intervention to apply. Here, we evaluated the potentiality of single-cell transcriptomics for automated diagnosis and drug treatment of breast cancer. We transcriptionally profiled 35,276 individual cells from 33 BC cell-lines covering all main BC subtypes to yield a Breast Cancer Single Cell Atlas. We show that single cell transcriptomics can successfully detect clinically relevant BC biomarkers and that atlas can be used to automatically predict cancer subtype and composition from a patient's tumour biopsy. We found that BC cell lines arbour a high degree of heterogeneity in the expression of clinically relevant BC biomarkers and that such heterogeneity enables cells with differential drug sensitivity to co-exist even within a genomically stable isogenic cell line. Finally, we developed a novel bioinformatics approach named DREEP (DRug Estimation from Expression Profiles) to automatically predict responses to more than 450 anticancer agents starting from single-cell transcriptional profiles. We validated DREEP both in-silico and in-vitro by selectively inhibiting the growth of the HER2-deficient subpopulation in the MDAMB361 cell line. Our work shows transcriptional heterogeneity is common, dynamic and plays a relevant role in determining drug sensitivity. Moreover, our Breast Cancer Single Cell Atlas and DREEP approach are a unique resource for the BC research community and to advance the use of single-cell sequencing in the clinics.

show abstract

Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview

Slovin

Carissimo

Panariello

et al. 2021

Methods in Molecular Biology

106

View full text Add to dashboard Cite

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Cited by 16 publications

References 28 publications

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

A single-cell atlas of breast cancer cell lines to study tumour heterogeneity and drug response

Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview

Contact Info

Product

Resources

About