First-order learning for Web mining

Craven, Mark; Slattery, Seán; Nigam, Kamal

doi:10.1007/bfb0026695

Cited by 22 publications

(3 citation statements)

References 2 publications

(4 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing ILP systems include FOIL (Weber, 1996), which is a system for learning intensional concept definitions from relational tuples. It has recently been applied to web mining (Craven et al, 1998). GOLEM (Weber, 1996) is a 'classic' among empirical ILP systems.…”

Section: Inductive Logic Theorymentioning

confidence: 99%

Parallel and Sequential Algorithms for Data Mining Using Inductive Logic

Skillicorn

Wang

2001

Knowledge and Information Systems

View full text Add to dashboard Cite

Inductive logic is a research area in the intersection of machine learning and logic programming, and has been increasingly applied to data mining. Inductive logic studies learning from examples, within the framework provided by clausal logic. It provides a uniform and expressive means of representation: examples, background knowledge, and induced theories are all expressed in first-order logic. Such an expressive representation is computationally expensive, so it is natural to consider improving the performance of inductive logic data mining using parallelism. We present a parallelization technique for inductive logic, and implement a parallel version of a core inductive logic programming system: Progol. The technique provides perfect partitioning of computation and data access and communication requirements are small, so almost linear speedup is readily achieved. However, we also show why the information flow of the technique permits superlinear speedup over the standard sequential algorithm. Performance results on several datasets and platforms are reported. The results have wider implications for the design on parallel and sequential data-mining algorithms.

show abstract

Section: Inductive Logic Theorymentioning

confidence: 99%

Parallel and Sequential Algorithms for Data Mining Using Inductive Logic

Skillicorn

Wang

2001

Knowledge and Information Systems

View full text Add to dashboard Cite

show abstract

“…The graph structure of the Web makes it an interesting domain for relational learning [2]. Moreover, Craven, Slattery, and Nigam demonstrated that for several Web-based learning tasks, a relational learning algorithm can learn more accurate classifiers than a common statistical approach [3]. Therefore, many researchers have been done to apply relational learning algorithms to web page classification.…”

Section: Introductionmentioning

confidence: 99%

Web Page Classification Using Relational Learning Algorithm and Unlabeled Data

Li¹,

Guo

2011

JCP

View full text Add to dashboard Cite

<p><span style="font-family: "Times New Roman"; font-size: 10.5pt; mso-bidi-font-size: 9.0pt; mso-fareast-font-family: 宋体; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA; mso-font-kerning: 1.0pt;" lang="EN-US">Applying relational tri-training (R-tri-training for short) to web page classification is investigated in this paper. R-tri-training, as a new relational semi-supervised learning algorithm, is well suitable for learning in web page classification. The semi-supervised component of R-tri-training allows it to exploit unlabeled web pages to enhance the learning performance effectively. In addition, the relational component of R-tri-training is able to describe how the neighboring web pages are related to each other by hyperlinks. Experiments on Web-Kb dataset show that: 1) a large amount of unlabeled web pages (the unlabeled data) can be used by R-tri-training to enhance the performance of the learned hypothesis; 2) the performance of R-tri-training is better than the other algorithms compared with it.</span></p>

show abstract

“…It took a user-defined feature set together with a set of hand tagged training documents and learned rules for extraction. Craven et al [4] reported that greater accuracy could be achieved by representing each web page as a node in graph and each hyperlink an edge. Cardie [5] provided a list of learning-based IE problems, including the difficulty of obtaining enough training data and the lack of corpora annotated with the appropriate semantic and domainspecific supervisory information.…”

Section: Introductionmentioning

confidence: 99%

Web Structure Analysis for Information Mining

Vijjappu¹,

Tan²,

Tan³

2003

Series in Machine Perception and Artificial Intelligence

View full text Add to dashboard Cite

Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains.

show abstract

First-order learning for Web mining

Cited by 22 publications

References 2 publications

Parallel and Sequential Algorithms for Data Mining Using Inductive Logic

Parallel and Sequential Algorithms for Data Mining Using Inductive Logic

Web Page Classification Using Relational Learning Algorithm and Unlabeled Data

Web Structure Analysis for Information Mining

Contact Info

Product

Resources

About