Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. "WebKB","StatSnowball") are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs. This paper presents an iterative approach for training NER and NET classifiers in scientific publications that relies on minimal human input, namely a small seed set of instances for the targeted entity type. We introduce different strategies for training data extraction, semantic expansion, and result entity filtering. We evaluate our approach on scientific publications, focusing on the long-tail entities types Datasets, Methods in computer science publications, and Proteins in biomedical publications.1 https://scholar.google.de/scholar?q=publications+using++social+media+datasets +for+food+recipes+recommendation.
Learning analytics dashboards (LADs) are designed as feedback tools for learners, but until recently, learners rarely have had a say in how LADs are designed and what information they receive through LADs. To overcome this shortcoming, we have developed a customisable LAD for Coursera MOOCs on which learners can set goals and choose indicators to monitor. Following a mixedmethods approach, we analyse 401 learners' indicator selection behaviour in order to understand the decisions they make on the LAD and whether learner goals and self-regulated learning skills influence these decisions. We found that learners overwhelmingly chose indicators about completed activities. Goals are not associated with indicator selection behaviour, while help-seeking skills predict learners' choice of monitoring their engagement in discussions and time management skills predict learners' interest in procrastination indicators. The findings have implications for our understanding of learners' use of LADs and their design.
Active reading strategies-such as content annotations (through the use of highlighting and note-taking, for example)-have been shown to yield improvements to a learner's knowledge and understanding of the topic being explored. This has been especially notable in long and complex learning endeavours. With web search This research has been supported by DDS (Delft Data Science) and NWO projects SearchX (639.022.722) and Aspasia (015.013.027).
Social media provides a timely yet challenging data source for adverse drug reaction (ADR) detection. Existing dictionary-based, semisupervised learning approaches are intrinsically limited by the coverage and maintainability of laymen health vocabularies. In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples. This allows for efficient social-media ADR detection with low training and retraining costs to adapt to the changes and emergence of informal medical laymen terms. An extensive evaluation performed on Twitter and Reddit data shows that our approach matches the performance of fully-supervised approaches while requiring only 25% of training data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.