Oliver Adams scite author profile

Most languages have no established writing system and minimal written records. However, textual data is essential for natural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fundamental task of documentary linguistics. We investigate the use of such lexicons to improve language models when textual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolingual language models. Results across a number of languages show that language models are improved by this pre-training. Application to Yongning Na, a threatened language, highlights challenges in deploying the approach in real low-resource environments.

show abstract

The conserved protein Seb1 drives transcription termination by binding RNA polymerase II and nascent RNA

Wittmann

Renner

Watts

et al. 2017

Nat Commun

View full text Add to dashboard Cite

Termination of RNA polymerase II (Pol II) transcription is an important step in the transcription cycle, which involves the dislodgement of polymerase from DNA, leading to release of a functional transcript. Recent studies have identified the key players required for this process and showed that a common feature of these proteins is a conserved domain that interacts with the phosphorylated C-terminus of Pol II (CTD-interacting domain, CID). However, the mechanism by which transcription termination is achieved is not understood. Using genome-wide methods, here we show that the fission yeast CID-protein Seb1 is essential for termination of protein-coding and non-coding genes through interaction with S2-phosphorylated Pol II and nascent RNA. Furthermore, we present the crystal structures of the Seb1 CTD- and RNA-binding modules. Unexpectedly, the latter reveals an intertwined two-domain arrangement of a canonical RRM and second domain. These results provide important insights into the mechanism underlying eukaryotic transcription termination.

show abstract

Massively Multilingual Adversarial Speech Recognition

Adams

Wiesner

Watanabe

et al. 2019

View full text Add to dashboard Cite

We report on adaptation of multilingual endto-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective. 2 festvox.org/cmu_wilderness/index.html

show abstract

Cryo-EM structure and resistance landscape of M. tuberculosis MmpL3: An emergent therapeutic target

et al. 2021

View full text Add to dashboard Cite

show abstract

Aikuma: A Mobile App for Collaborative Language Documentation

Bird

Hanke²,

Adams³

et al. 2014

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Oliver Adams

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

The conserved protein Seb1 drives transcription termination by binding RNA polymerase II and nascent RNA

Massively Multilingual Adversarial Speech Recognition

Cryo-EM structure and resistance landscape of M. tuberculosis MmpL3: An emergent therapeutic target

Aikuma: A Mobile App for Collaborative Language Documentation

Contact Info

Product

Resources

About