Evaluating Discourse and Dialogue Coding Schemes

Craggs, Richard; Wood, Mary McGee

doi:10.1162/089120105774321109

Cited by 51 publications

(47 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The correction for chance agreement in Cohen's kappa has been the subject of much controversy (Brennan and Prediger, 1981;Feinstein and Cicchetti, 1990;Uebersax, 1987;Byrt et al, 1993;Gwet, 2002;Di Eugenio and Glass, 2004;Sim and Wright, 2005;Craggs and Wood, 2005;Powers, 2012). Firstly, it assumes that when assessors are unsure of a score, they guess at random according to a fixed prior distribution of scores.…”

Section: Discussionmentioning

confidence: 99%

Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

Tetreault

Burstein

Leacock³

2015

View full text Add to dashboard Cite

iii IntroductionWe are excited to be holding the 10th anniversary the BEA workshop. Since starting in 1997, the BEA workshop, now one of the largest workshops at NAACL/ACL, has become one of the leading venues for publishing innovative work that uses NLP to develop educational applications. The consistent interest in and growth of the workshop has clear ties to societal need and related advances in the technology, and the maturity of the NLP/education field. NLP capabilities now support an array of learning domains, including writing, speaking, reading, and mathematics. Within these domains, the community continues to develop and deploy innovative NLP approaches for use in educational settings. In the writing and speech domains, automated writing evaluation (AWE) and speech scoring applications, respectively, are commercially deployed in high-stakes assessment and instructional settings, including Massive Open Online Courses (MOOCs). We also see widely-used commercial applications for plagiarism detection and peer review. Major advances in speech technology, have made it possible to include speech in both assessment and Intelligent Tutoring Systems. There has been a renewed interest in spoken dialog and multi-modal systems for instruction and assessment as well as feedback. We are also seeing explosive growth of mobile applications for game-based applications for instruction and assessment. The current educational and assessment landscape, continues to foster a strong interest and high demand that pushes the state-of-the-art in AWE capabilities to expand the analysis of written responses to writing genres other than those traditionally found in standardized assessments, especially writing tasks requiring use of sources and argumentative discourse.The use of NLP in educational applications has gained visibility outside of the NLP community. First, the Hewlett Foundation reached out to public and private sectors and sponsored two competitions: one for automated essay scoring, and the other for scoring of short answer, fact-based response items. The motivation driving these competitions was to engage the larger scientific community in this enterprise. MOOCs are now beginning to incorporate AWE systems to manage the thousands of constructedresponse assignments collected during a single MOOC course. Learning@Scale is a recent venue for discussing NLP research in education. The NLP-TEA workshop, now in its second year (NLP-TEA2), gives special attention to papers working on Asian languages. The Speech and Language Technology in Education (SLaTE), now in its sixth year, promotes the use of speech and language technology for educational purposes. Another breakthrough for educational applications within the CL community is the presence of a number of shared-task competitions over the last three years. There have been three shared tasks on grammatical error correction with the most recent edition hosted at CoNLL 2014. In 2014 alone, there were four shared tasks for NLP and Education-related areas.As a community, we contin...

show abstract

Section: Discussionmentioning

confidence: 99%

Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

Tetreault

Burstein

Leacock³

2015

View full text Add to dashboard Cite

show abstract

“…In corpus research there is much work with annotations that need subjective judgements of a more subjective nature from an annotator about the behavior being annotated. This holds for Human Computer Interaction topics such as affective computing or the development of Embodied Conversational Agents with a personality, but also for work in computational linguistics on topics such as emotion (Craggs and McGee Wood, 2005), subjectivity (Wiebe et al, 1999;Wilson, 2008) and agreement and disagreement (Galley et al, 2004). If we want to interpret the results of classifiers in terms of the patterns of (dis)agreement found between annotators, we need to subject the classifiers with respect to each other and to the 'ground truth data' to the same analyses used to evaluate and compare annotators to each other.…”

Section: Related Workmentioning

confidence: 99%

Proceedings of the Workshop on Human Judgements in Computational Linguistics - HumanJudge '08

2008

View full text Add to dashboard Cite

Supervised word sense disambiguation requires training corpora that have been tagged with word senses, and these word senses typically come from a pre-existing sense inventory. Space limitations imposed by dictionary publishers have biased the field towards lists of discrete senses for an individual lexeme. Although some dictionaries use hierarchical entries to emphasize relations between senses, many do not. WordNet, which has been the default choice of NLP researchers for sense tagging because of its broad coverage and easy accibility, does not have hierarchical entries. Could the relations between senses that are captured by a hierarchy be useful to NLP systems? Concerns have also been raised about whether or not WordNet's word senses are unnecessarily fine-grained. WSD systems are obviously more successful in distinguishing coarse-grained senses than fine-grained ones (Navigli, 2006), but important information could be lost if fine-grained distinctions are ignored. Recent psycholinguistic evidence seems to indicate that closely related word senses may be represented in the mental lexicon much like a single sense, whereas distantly related senses may be represented more like discrete entities (Brown, 2008). These results suggest that, for the purposes of WSD, closely related word senses can be clustered together into a more general sense with little meaning loss. This talk will describe this psycholinguistic research and its current implications for automatic word sense disambiguation, as well as plans for future research and its possible impact.

show abstract

“…More data will yield more signal and the learner will ignore the noise. However, as Craggs and McGee Wood (2005) suggest, this also makes systematic disagreement dangerous, because it provides an unwanted pattern for the learner to detect. We demonstrate that machine learning can tolerate data with a low reliability measurement as long as the disagreement looks like random noise, and that when it does not, data can have a reliability measure commonly held to be acceptable but produce misleading results.…”

Section: Introductionmentioning

confidence: 99%

Untitled

2008

Computational Linguistics

View full text Add to dashboard Cite

Evaluating Discourse and Dialogue Coding Schemes

Cited by 51 publications

References 5 publications

Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

Proceedings of the Workshop on Human Judgements in Computational Linguistics - HumanJudge '08

Untitled

Contact Info

Product

Resources

About