Conditioned behavior of isolated and grouped cockroaches on a simple maze.

We present a comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text. Different evaluations result in different orderings of embedding methods, calling into question the common assumption that there is one single optimal vector representation. We present new evaluation techniques that directly compare embeddings with respect to specific queries. These methods reduce bias, provide greater insight, and allow us to solicit data-driven relevance judgments rapidly and accurately through crowdsourcing.

show abstract

Deep Questions without Deep Understanding

Labutov¹,

Basu²,

Vanderwende³

2015

117

View full text Add to dashboard Cite

We develop an approach for generating deep (i.e, high-level) comprehension questions from novel text that bypasses the myriad challenges of creating a full semantic representation. We do this by decomposing the task into an ontologycrowd-relevance workflow, consisting of first representing the original text in a low-dimensional ontology, then crowdsourcing candidate question templates aligned with that space, and finally ranking potentially relevant templates for a novel region of text. If ontological labels are not available, we infer them from the text. We demonstrate the effectiveness of this method on a corpus of articles from Wikipedia alongside human judgments, and find that we can generate relevant deep questions with a precision of over 85% while maintaining a recall of 70%.

show abstract

Joint Concept Learning and Semantic Parsing from Natural Language Explanations

Srivastava¹,

Labutov²,

Mitchell³

2017

View full text Add to dashboard Cite

Natural language constitutes a predominant medium for much of human learning and pedagogy. We consider the problem of concept learning from natural language explanations, and a small number of labeled examples of the concept. For example, in learning the concept of a phishing email, one might say 'this is a phishing email because it asks for your bank account number'. Solving this problem involves both learning to interpret open-ended natural language statements, as well as learning the concept itself. We present a joint model for (1) language interpretation (semantic parsing) and (2) concept learning (classification) that does not require labeling statements with logical forms. Instead, the model prefers discriminative interpretations of statements in context of observable features of the data as a weak signal for parsing. On a dataset of email-related concepts, this approach yields across-theboard improvements in classification performance, with a 30% relative improvement in F1 score over competitive classification methods in the low data regime.

show abstract

Zero-shot Learning of Classifiers from Natural Language Quantification

Srivastava

Labutov

Mitchell

2018

View full text Add to dashboard Cite

Humans can efficiently learn new concepts using language. We present a framework through which a set of explanations of a concept can be used to learn a classifier without access to any labeled examples. We use semantic parsing to map explanations to probabilistic assertions grounded in latent class labels and observed attributes of unlabeled data, and leverage the differential semantics of linguistic quantifiers (e.g., 'usually' vs 'always') to drive model training. Experiments on three domains show that the learned classifiers outperform previous approaches for learning with limited data, and are comparable with fully supervised classifiers trained from a small number of labeled examples.

show abstract

APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions

Labutov

Li³

et al. 2018

View full text Add to dashboard Cite

Learning Student and Content Embeddings for Personalized Lesson Sequence Recommendation

Reddy

Labutov

Joachims

2016

View full text Add to dashboard Cite

Students in online courses generate large amounts of data that can be used to personalize the learning process and improve quality of education. In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helping students prepare for specific assessments. Akin to collaborative filtering for recommender systems, the algorithm does not require students or content to be described by features, but it learns a representation using access traces. We formulate this problem as a regularized maximum-likelihood embedding of students, lessons, and assessments from historical student-content interactions. An empirical evaluation on large-scale data from Knewton, an adaptive learning technology company, shows that this approach predicts assessment results competitively with benchmark models and is able to discriminate between lesson sequences that lead to mastery and failure.

show abstract

Generating Code-switched Text for Lexical Learning

Labutov

Lipson

2014

View full text Add to dashboard Cite

A vast majority of L1 vocabulary acquisition occurs through incidental learning during reading (Nation, 2001;Schmitt et al., 2001). We propose a probabilistic approach to generating code-mixed text as an L2 technique for increasing retention in adult lexical learning through reading. Our model that takes as input a bilingual dictionary and an English text, and generates a code-switched text that optimizes a defined "learnability" metric by constructing a factor graph over lexical mentions. Using an artificial language vocabulary, we evaluate a set of algorithms for generating code-switched text automatically by presenting it to Mechanical Turk subjects and measuring recall in a sentence completion task.

show abstract

Automatic Concept Extraction for Domain and Student Modeling in Adaptive Textbooks

Chau

Labutov

Thaker

et al. 2020

Int J Artif Intell Educ

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Igor Labutov

Evaluation methods for unsupervised word embeddings

Deep Questions without Deep Understanding

Joint Concept Learning and Semantic Parsing from Natural Language Explanations

Zero-shot Learning of Classifiers from Natural Language Quantification

APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions

Learning Student and Content Embeddings for Personalized Lesson Sequence Recommendation

Generating Code-switched Text for Lexical Learning

Automatic Concept Extraction for Domain and Student Modeling in Adaptive Textbooks

Contact Info

Product

Resources

About