Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Verma, Saurabh; Hruschka, Estevam R.

doi:10.1007/978-3-642-33486-3_20

Cited by 10 publications

(13 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CBS algorithm by Verma and Hruschka [11] also addresses the problem of extracting noun phrases to populate category instances. Likewise, it adapts a semi-supervised approach and it uses the co-occurrence statistics between nouns and contexts.…”

Section: Nell Cpl Cbs and Beyondmentioning

confidence: 99%

“…Those candidates are then ranked and the uppermost ones are promoted to be the trusted category instances, which will then be used as seeds in the following iterations. An in-depth discussion on Bayesian Sets and Coupled Bayesian Sets is available in [11].…”

Section: Nell Cpl Cbs and Beyondmentioning

confidence: 99%

“…One major challenge posed by semi-supervised learning is that using a small number of labeled examples along with many unlabeled ones are often unreliable as they frequently produce an internally consistent, but nevertheless, incorrect set of extractions [11]. Even though semi-supervised approaches are promising, they might exhibit low accuracy, owing to the fact that initial labeled examples are limited in number and tend to be insufficient to properly constrain the learning act.…”

Section: Introductionmentioning

confidence: 99%

“…Even though semi-supervised approaches are promising, they might exhibit low accuracy, owing to the fact that initial labeled examples are limited in number and tend to be insufficient to properly constrain the learning act. This phenomenon is called "semantic (concept) drift" [12] and it is addressed by Verma and Hruschka [11] through learning independent classifiers simultaneously, in a new approach named the Coupled Bayesian Sets (CBS) algorithm. CBS outperforms CPL after 10 iterations, nominating itself as a good alternative for the free text extractor of Nell.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning relational facts from the web: A tolerance rough set approach

Sengoz

Ramanna

2015

Pattern Recognition Letters

View full text Add to dashboard Cite

Section: Nell Cpl Cbs and Beyondmentioning

confidence: 99%

Section: Nell Cpl Cbs and Beyondmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning relational facts from the web: A tolerance rough set approach

Sengoz

Ramanna

2015

Pattern Recognition Letters

View full text Add to dashboard Cite

“…Systems for learning categories and relations of entities on the web, like the Never-Ending Language Learner (NELL) system (Carlson et al 2010a,b;Verma and Hruschka 2012), or KnowItAll (Etzioni et al 2005) can be used to construct lists but require extensive preprocessing. We do not preprocess, instead we perform information extraction online, deterministically, and virtually instantaneously given access to a search engine.…”

Section: Related Workmentioning

confidence: 99%

Growing a list

Letham

Rudin

Heller

2013

Data Min Knowl Disc

View full text Add to dashboard Cite

It is easy to find expert knowledge on the Internet on almost any topic, but obtaining a complete overview of a given topic is not always easy: information can be scattered across many sources and must be aggregated to be useful. We introduce a method for intelligently growing a list of relevant items, starting from a small seed of examples. Our algorithm takes advantage of the wisdom of the crowd, in the sense that there are many experts who post lists of things on the Internet. We use a collection of simple machine learning components to find these experts and aggregate their lists to produce a single complete and meaningful list. We use experiments with gold standards and open-ended experiments without gold standards to show that our method significantly outperforms the state of the art. Our method uses the ranking algorithm Bayesian Sets even when its underlying independence assumption is violated, and we provide a theoretical generalization bound to motivate its use.

show abstract