2017
DOI: 10.1016/j.ipm.2017.04.007
|View full text |Cite
|
Sign up to set email alerts
|

DERIN: A data extraction method based on rendering information and n-gram

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…In the literature, there are many proposals to extract data from HTML documents in general, not specifically tables (Ferrara, de Meo, Fiumara, & Baumgartner, 2014;Sleiman & Corchuelo, 2013a). They rely on text alignment (Sleiman & Corchuelo, 2013b), neural networks (Sleiman & Corchuelo, 2014), learning first-order rules (Jiménez & Corchuelo, 2016a), inferring propositiorelational rules (Jiménez & Corchuelo, 2016b), learning decision trees (Uzun, Agun, & Yerlikaya, 2013), embedding graphs (Jiménez, Roldán, Gallego, & Corchuelo, 2020), or using n-grams and rendering information (Figueiredo, Assis, & Ferreira, 2017), to mention a few. Unfortunately, they do not seem to be appropriate to extract the underlying relationships between the cells in HTML tables (Cafarella et al, 2018), which motivated much work on table-understanding (Roldán et al, 2020;Zhang & Balog, 2020).…”
Section: Context and Motivationmentioning
confidence: 99%
“…In the literature, there are many proposals to extract data from HTML documents in general, not specifically tables (Ferrara, de Meo, Fiumara, & Baumgartner, 2014;Sleiman & Corchuelo, 2013a). They rely on text alignment (Sleiman & Corchuelo, 2013b), neural networks (Sleiman & Corchuelo, 2014), learning first-order rules (Jiménez & Corchuelo, 2016a), inferring propositiorelational rules (Jiménez & Corchuelo, 2016b), learning decision trees (Uzun, Agun, & Yerlikaya, 2013), embedding graphs (Jiménez, Roldán, Gallego, & Corchuelo, 2020), or using n-grams and rendering information (Figueiredo, Assis, & Ferreira, 2017), to mention a few. Unfortunately, they do not seem to be appropriate to extract the underlying relationships between the cells in HTML tables (Cafarella et al, 2018), which motivated much work on table-understanding (Roldán et al, 2020;Zhang & Balog, 2020).…”
Section: Context and Motivationmentioning
confidence: 99%
“…Data mining principles can be independent of a particular domain for knowledge extraction [11] since their methods are able to learn how to extract the data, perform a given analysis domain independently and detect different record structures and their attributes based on rendering information [18]. It is increased the importance of understanding correlations between data, and data mining methods are interesting to find some patterns and association rules for various analyses and decision aids such as product category recommendations and determination of possible behavioral changes [31].…”
Section: Data Mining and Meteorologymentioning
confidence: 99%
“…n-Gram Models help determine the probability of a sequence of words in a sentence or in a text. Their application varies from identifying patterns in text [42] to data extraction [43], automatic speech recognition, machine translation, and spell checking [44,45]. Neural Network Language models offer an improved version [46], both having the potential to be integrated into computer-assisted tools for supporting text reviewers.…”
Section: Natural Language Processing Approachesmentioning
confidence: 99%