Amy X. Lu scite author profile

If you already have a Python installation with a different version (e.g., 2.7) that you must keep, consider installing Python 3.8 through Anaconda ("Anaconda Software Distribution," 2020): https:// docs.anaconda.com/ anaconda/ install. Download required files.Through your browser, navigate to http:// data.bioembeddings.com/ disprot and download the files: sequences.fasta, config.yml, and dis-prot_annotations.csv.Note that you might need to right click and select "Save Link As" to download the files.

show abstract

Hurtful words

Zhang

Abdalla

et al. 2020

View full text Add to dashboard Cite

In this work, we examine the extent to which embeddings may encode marginalized populations differently, and how this may lead to a perpetuation of biases and worsened performance on clinical tasks. We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset, and quantify potential disparities using two approaches. First, we identify dangerous latent relationships that are captured by the contextual word embeddings using a fill-in-the-blank method with text from real clinical notes and a log probability bias score quantification. Second, we evaluate performance gaps across different definitions of fairness on over 50 downstream clinical prediction tasks that include detection of acute and chronic conditions. We find that classifiers trained from BERT representations exhibit statistically significant differences in performance, often favoring the majority group with regards to gender, language, ethnicity, and insurance status. Finally, we explore shortcomings of using adversarial debiasing to obfuscate subgroup information in contextual word embeddings, and recommend best practices for such deep embedding models in clinical settings.

show abstract

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Pritišanac

et al. 2022

PLoS Comput Biol

View full text Add to dashboard Cite

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call “reverse homology”, exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.

show abstract

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Pritišanac

et al. 2021

Preprint

View full text Add to dashboard Cite

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features, such as short motifs, amino acid repeats and physicochemical properties that mediate the functions of these regions. Here, we introduce a proteome-scale feature discovery method for IDRs. Our method, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a randomly held-out homologue from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, and other features. We also show that our model can be used to produce specific predictions of what residues and regions are most important to the function, providing a computational strategy for designing mutagenesis experiments in uncharacterized IDRs. Our results suggest that feature discovery using neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.

show abstract

History and publication trends in the diffusion and early uptake of indirect comparison meta-analytic methods to study drugs: animated coauthorship networks over time

Ban

Tadrous

et al. 2018

BMJ Open

View full text Add to dashboard Cite

ObjectiveTo characterise the early diffusion of indirect comparison meta-analytic methods to study drugs.DesignSystematic literature synthesis.Data sourcesCochrane Database of Systematic Reviews, EMBASE, MEDLINE, Scopus and Web of Science.Study selectionEnglish language papers that used indirect comparison meta-analytic methods to study the efficacy or safety of three or more interventions, where at least one was a drug.Data extractionThe number of publications and authors was plotted by year and type: methodological contribution, review or empirical application. Author and methodological details were summarised for empirical applications, and animated coauthorship networks were created to visualise contributors by country and affiliation type (academia, industry, government or other) over time.ResultsWe identified 477 papers (74 methodological contributions, 42 reviews and 361 empirical applications) by 1689 distinct authors from 1997 to 2013. Prior to 2002, only three applications were published, with contributions from the USA (n=2) and Canada (n=1). The number of applications gradually increased annually with rapid uptake between 2011 and 2013 (n=254, 71%). Early diffusion occurred primarily in Europe with the first application credited to the UK in 2003. Application spread to other European countries in 2005, and may have been supported by regulatory requirements for drug approval. By the end of 2013, contributions included 49% credited to Europe (22% UK, 27% other), 37% credited to North America (11% Canada, 26% USA) and 14% from other regions.ConclusionIndirect comparison meta-analytic methods are an important innovation for health research. Although Canada and the USA were the first to apply these methods, Europe led their diffusion. The increase in uptake of these methods may have been facilitated by acceptance by regulatory agencies, which are calling for more comparative drug effect data to assist in drug accessibility and reimbursement decisions.

show abstract

Strategies for effectively modelling promoter-driven gene expression using transfer learning

Reddy

Herschl

Kolli

et al. 2023

Preprint

View full text Add to dashboard Cite

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.

show abstract

COOS-7 (Cells Out Of Sample 7-Class)

Lu¹,

Lu²,

Schormann³

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Amy X. Lu

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

Hurtful words

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

History and publication trends in the diffusion and early uptake of indirect comparison meta-analytic methods to study drugs: animated coauthorship networks over time

Strategies for effectively modelling promoter-driven gene expression using transfer learning

COOS-7 (Cells Out Of Sample 7-Class)

Contact Info

Product

Resources

About