Daniel Jurafsky scite author profile

Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACEstyle algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision. For each pair of entities that appears in some Freebase relation, we find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. Our algorithm combines the advantages of supervised IE (combining 400,000 noisy pattern features in a probabilistic classifier) and unsupervised IE (extracting large numbers of relations from large corpora of any domain). Our model is able to extract 10,000 instances of 102 relations at a precision of 67.6%. We also analyze feature performance, showing that syntactic parse features are particularly helpful for relations that are ambiguous or lexically distant in their expression.

show abstract

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Stolcke

Coccaro

Bates

et al. 2000

Computational Linguistics

814

756

View full text Add to dashboard Cite

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DIS-AGREEMENT, and APOLOGY. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.

show abstract

Automatic Labeling of Semantic Roles

Gildea

Jurafsky

2002

Computational Linguistics

1,124

692

View full text Add to dashboard Cite

We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Given an input sentence and a target word and frame, the system labels constituents with either abstract semantic roles, such as Agent or Patient, or more domain-specific semantic roles, such as Speaker, Message, and Topic. The system is based on statistical classifiers trained on roughly 50,000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project. We then parsed each training sentence into a syntactic tree and extracted various lexical and syntactic features, including the phrase type of each constituent, its grammatical function, and its position in the sentence. These features were combined with knowledge of the predicate verb, noun, or adjective, as well as information such as the prior probabilities of various combinations of semantic roles. We used various lexical clustering algorithms to generalize across possible fillers of roles. Test sentences were parsed, were annotated with these features, and were then passed through the classifiers. Our system achieves 82% accuracy in identifying the semantic role of presegmented constituents. At the more difficult task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall. Our study also allowed us to compare the usefulness of different features and feature combination methods in the semantic role labeling task. We also explore the integration of role labeling with statistical syntactic parsing and attempt to generalize to predicates unseen in the training data.

show abstract

Predictability effects on durations of content and function words in conversational English

Bell

Brenier²,

Gregory

et al. 2009

Journal of Memory and Language

499

613

View full text Add to dashboard Cite

In a regression study of conversational speech, we show that frequency, contextual predictability and repetition have separate contributions to word duration, despite their substantial correlations. Moreover, content-and function-word durations are affected differently by their frequency and predictability. Content words are shorter when more frequent, and shorter when repeated, while function words are not so affected. Function words have shorter pronunciations, after controlling for frequency and predictability. While both content and function words are strongly affected by predictability from the word following them, sensitivity to predictability from the preceding word is largely limited to very frequent function words. The results support the view that content and function words are accessed differently in production. We suggest a lexical-access-based model of our results, in which frequency or repetition lead to shorter or longer word durations by causing faster or slower lexical access, mediated by a general mechanism that coordinates the pace of higher-level planning and the execution of the articulatory plan.

show abstract

Automatic labeling of semantic roles

Gildea¹,

Jurafsky²

2000

398

539

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel Jurafsky

Distant supervision for relation extraction without labeled data

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Automatic Labeling of Semantic Roles

Predictability effects on durations of content and function words in conversational English

Automatic labeling of semantic roles

Contact Info

Product

Resources

About