Manuel Valle Torre scite author profile

Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. "WebKB","StatSnowball") are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs. This paper presents an iterative approach for training NER and NET classifiers in scientific publications that relies on minimal human input, namely a small seed set of instances for the targeted entity type. We introduce different strategies for training data extraction, semantic expansion, and result entity filtering. We evaluate our approach on scientific publications, focusing on the long-tail entities types Datasets, Methods in computer science publications, and Proteins in biomedical publications.1 https://scholar.google.de/scholar?q=publications+using++social+media+datasets +for+food+recipes+recommendation.

show abstract

Quantum of Choice: How learners’ feedback monitoring decisions, goals and self-regulated learning skills are related

Jivet

Wong

Scheffel

et al. 2021

View full text Add to dashboard Cite

Learning analytics dashboards (LADs) are designed as feedback tools for learners, but until recently, learners rarely have had a say in how LADs are designed and what information they receive through LADs. To overcome this shortcoming, we have developed a customisable LAD for Coursera MOOCs on which learners can set goals and choose indicators to monitor. Following a mixedmethods approach, we analyse 401 learners' indicator selection behaviour in order to understand the decisions they make on the LAD and whether learner goals and self-regulated learning skills influence these decisions. We found that learners overwhelmingly chose indicators about completed activities. Goals are not associated with indicator selection behaviour, while help-seeking skills predict learners' choice of monitoring their engagement in discussions and time management skills predict learners' interest in procrastination indicators. The findings have implications for our understanding of learners' use of LADs and their design.

show abstract

Note the Highlight

Roy

Torre

Gadiraju

et al. 2021

View full text Add to dashboard Cite

Active reading strategies-such as content annotations (through the use of highlighting and note-taking, for example)-have been shown to yield improvements to a learner's knowledge and understanding of the topic being explored. This has been especially notable in long and complex learning endeavours. With web search This research has been supported by DDS (Delft Data Science) and NWO projects SearchX (639.022.722) and Aspasia (015.013.027).

show abstract

Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content

Mesbah¹,

Yang²,

Sips³

et al. 2019

View full text Add to dashboard Cite

Social media provides a timely yet challenging data source for adverse drug reaction (ADR) detection. Existing dictionary-based, semisupervised learning approaches are intrinsically limited by the coverage and maintainability of laymen health vocabularies. In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples. This allows for efficient social-media ADR detection with low training and retraining costs to adapt to the changes and emergence of informal medical laymen terms. An extensive evaluation performed on Twitter and Reddit data shows that our approach matches the performance of fully-supervised approaches while requiring only 25% of training data.

show abstract

edX log data analysis made easy

Torre

Tan

Hauff

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.