Johannes Hingerl scite author profile

Johannes Hingerl

5Publications

48Citation Statements Received

136Citation Statements Given

How they've been cited

How they cite others

174

136

Affiliations

Technical University of Munich

Publications

Order By: Most citations

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Friedrich

Adel

Tomazic³

et al. 2020

View full text Add to dashboard Cite

This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 openaccess scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.

show abstract

Species-aware DNA language models capture regulatory elements and their evolution

Gankin

Karollus

Grosshauser

et al. 2023

Preprint

View full text Add to dashboard Cite

Motivation: Predicting gene expression from DNA is an open field of research. As in many areas, labeled data is dwarfed by unlabelled data, i.e. species with a sequenced genome but no gene expression assay data. Pretraining on unlabelled data using masked language modeling has proven highly successful in overcoming data constraints in natural language and proteomics. However, in genomics, this approach has so far been applied only to single genomes, neither leveraging conservation of regulatory sequences across species nor the vast amount of available genomes. Results: Here we train a masked language model on more than 800 species spanning over 500 million years of evolution. We show that explicitly modeling species is instrumental in capturing conserved yet evolving regulatory elements and in controlling for oligomer biases. We extract embeddings for 3' untranslated regions of Saccharomyces cerevisiae and Schizosaccharomyces pombe and use them to achieve prediction of mRNA half-life that is better or on-par with the state-of-the-art, demonstrating the utility of the approach for regulatory genomics. Moreover, we show that the per-base reconstruction probability of our model significantly predicts RNA-binding protein bound sites directly. Altogether, our work establishes a self-supervised framework to leverage large genome collections of evolutionary distant species for regulatory genomics and contributes to alignment-free comparative genomics. Availability and implementation: The source code and trained models are available at: https://github.com/DennisGankin/species-aware-DNA-LM .

show abstract

Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing

Klaproth-Andrade

Hingerl

Smith

et al. 2023

Preprint

View full text Add to dashboard Cite

Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a new de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a new convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a new peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.

show abstract

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Friedrich¹,

Adel²,

Tomazic³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

Explainable Abusive Language Classification Leveraging User and Network Data

Wich

Mosca

Gorniak

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.