Tolga Can scite author profile

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/

show abstract

RRW: repeated random walks on genome-scale protein networks for local cluster discovery

Macropol

2009

View full text Add to dashboard Cite

show abstract

Efficient molecular surface generation using level-set methods

Can

Chen

Wang

2006

Journal of Molecular Graphics and Modelling

View full text Add to dashboard Cite

Estrogen-induced upregulation and 3′-UTR shortening of CDC6

Akman

Can

Erson-Bensan

2012

View full text Add to dashboard Cite

3′-Untranslated region (UTR) shortening of mRNAs via alternative polyadenylation (APA) has important ramifications for gene expression. By using proximal APA sites and switching to shorter 3′-UTRs, proliferating cells avoid miRNA-mediated repression. Such APA and 3′-UTR shortening events may explain the basis of some of the proto-oncogene activation cases observed in cancer cells. In this study, we investigated whether 17 β-estradiol (E2), a potent proliferation signal, induces APA and 3′-UTR shortening to activate proto-oncogenes in estrogen receptor positive (ER+) breast cancers. Our initial probe based screen of independent expression arrays suggested upregulation and 3′-UTR shortening of an essential regulator of DNA replication, CDC6 (cell division cycle 6), upon E2 treatment. We further confirmed the E2- and ER-dependent upregulation and 3′UTR shortening of CDC6, which lead to increased CDC6 protein levels and higher BrdU incorporation. Consequently, miRNA binding predictions and dual luciferase assays suggested that 3′-UTR shortening of CDC6 was a mechanism to avoid 3′-UTR-dependent negative regulations. Hence, we demonstrated CDC6 APA induction by the proliferative effect of E2 in ER+ cells and provided new insights into the complex regulation of APA. E2-induced APA is likely to be an important but previously overlooked mechanism of E2-responsive gene expression.

show abstract

3′UTR shortening and EGF signaling: implications for breast cancer

Akman¹,

Oyken²,

Tuncer³

et al. 2015

Hum. Mol. Genet.

View full text Add to dashboard Cite

Alternative polyadenylation (APA) plays a role in gene expression regulation generally by shortening of 3'UTRs (untranslated regions) upon proliferative signals and relieving microRNA-mediated repression. Owing to high proliferative indices of triple negative breast cancers (TNBCs), we hypothesized APA to cause 3'UTR length changes in this aggressive subgroup of breast cancers. Our probe-based meta-analysis approach identified 3'UTR length alterations where the significant majority was shortening events (∼70%, 113 of 165) of mostly proliferation-related transcripts in 520 TNBC patients compared with controls. Representative shortening events were further investigated for their microRNA binding potentials by computational predictions and dual-luciferase assay. In silico-predicted 3'UTR shortening events were experimentally confirmed in patient and cell line samples. To begin addressing the underlying mechanisms, we found CSTF2 (cleavage stimulation factor 2), a major regulator of 3'UTR shortening to be up-regulated in response to epidermal growth factor (EGF). EGF treatment also resulted with further shortening of the 3'UTRs. To investigate the contribution of CSTF2 and 3'UTR length alterations to the proliferative phenotype, we showed pharmacological inhibition of the EGF pathway to lead to a reduction in CSTF2 levels. Accordingly, RNAi-induced silencing of CSTF2 decreased the proliferative rate of cancer cells. Therefore, our computational and experimental approach revealed a pattern of 3'UTR length changes in TNBC patients and a potential link between APA and EGF signaling. Overall, detection of 3'UTR length alterations of various genes may help the discovery of new cancer-related genes, which may have been overlooked in conventional microarray gene expression analyses.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tolga Can

The CHEMDNER corpus of chemicals and drugs and its annotation principles

RRW: repeated random walks on genome-scale protein networks for local cluster discovery

Efficient molecular surface generation using level-set methods

Estrogen-induced upregulation and 3′-UTR shortening of CDC6

3′UTR shortening and EGF signaling: implications for breast cancer

Contact Info

Product

Resources

About