Michael Heilman scite author profile

The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggesstions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports,

show abstract

Question Generation via Overgenerating Transformations and Ranking

Heilman¹,

Smith²

2009

116

104

View full text Add to dashboard Cite

We describe an extensible approach to generating questions for the purpose of reading comprehension assessment and practice. Our framework for question generation composes general-purpose rules to transform declarative sentences into questions, is modular in that existing NLP tools can be leveraged, and includes a statistical component for scoring questions based on features of the input, output, and transformations performed. In an evaluation in which humans rated questions according to several criteria, we found that our implementation achieves 43.3% precisionat-10 and generates approximately 6.8 acceptable questions per 250 words of source text.

show abstract

Predicting Grammaticality on an Ordinal Scale

Heilman¹,

Cahill²,

Madnani³

et al. 2014

View full text Add to dashboard Cite

Automated methods for identifying whether sentences are grammatical have various potential applications (e.g., machine translation, automated essay scoring, computer-assisted language learning). In this work, we construct a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores). We also present a new publicly available dataset of learner sentences judged for grammaticality on an ordinal scale. In evaluations, we compare our system to the one from Post (2011) and find that our approach yields state-of-the-art performance.

show abstract

Validation of automated scoring of science assessments

et al. 2016

View full text Add to dashboard Cite

Constructed response items can both measure the coherence of student ideas and serve as reflective experiences to strengthen instruction. We report on new automated scoring technologies that can reduce the cost and complexity of scoring constructed-response items. This study explored the accuracy of crater-ML, an automated scoring engine developed by Educational Testing Service, for scoring eight science inquiry items that require students to use evidence to explain complex phenomena. Automated scoring showed satisfactory agreement with human scoring for all test takers as well as specific subgroups. These findings suggest that c-rater-ML offers a promising solution to scoring constructed-response science items and has the potential to increase the use of these items in both instruction and assessment. # 2015 Wiley Periodicals, Inc. J Res Sci Teach 53: [215][216][217][218][219][220][221][222][223][224][225][226][227][228][229][230][231][232][233] 2016

show abstract

An analysis of statistical models and features for reading difficulty prediction

Heilman

Collins-Thompson

Eskénazi

2008

View full text Add to dashboard Cite

A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measurement. The results indicate that a model for ordinal regression, such as the proportional odds model, using a combination of grammatical and lexical features is most effective at predicting reading difficulty.

show abstract

Applying Argumentation Schemes for Essay Scoring

Yi¹,

Heilman²,

Klebanov³

et al. 2014

View full text Add to dashboard Cite

Under the framework of the argumentation scheme theory (Walton, 1996), we developed annotation protocols for an argumentative writing task to support identification and classification of the arguments being made in essays. Each annotation protocol defined argumentation schemes (i.e., reasoning patterns) in a given writing prompt and listed questions to help evaluate an argument based on these schemes, to make the argument structure in a text explicit and classifiable. We report findings based on an annotation of 600 essays. Most annotation categories were applied reliably by human annotators, and some categories significantly contributed to essay score. An NLP system to identify sentences containing scheme-relevant critical questions was developed based on the human annotations.

show abstract

Different Texts, Same Metaphors: Unigrams and Beyond

Klebanov

Leong

Heilman

et al. 2014

View full text Add to dashboard Cite

Current approaches to supervised learning of metaphor tend to use sophisticated features and restrict their attention to constructions and contexts where these features apply. In this paper, we describe the development of a supervised learning system to classify all content words in a running text as either being used metaphorically or not. We start by examining the performance of a simple unigram baseline that achieves surprisingly good results for some of the datasets. We then show how the recall of the system can be improved over this strong baseline.

show abstract

Lasagne: First Release.

Dieleman¹,

Heilman²,

Kelly³

et al. 2015

111

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael Heilman

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Question Generation via Overgenerating Transformations and Ranking

Predicting Grammaticality on an Ordinal Scale

Validation of automated scoring of science assessments

An analysis of statistical models and features for reading difficulty prediction

Applying Argumentation Schemes for Essay Scoring

Different Texts, Same Metaphors: Unigrams and Beyond

Lasagne: First Release.

Contact Info

Product

Resources

About