Proceedings of the 20th ACM International Conference on Information and Knowledge Management 2011
DOI: 10.1145/2063576.2063763
|View full text |Cite
|
Sign up to set email alerts
|

Enabling information extraction by inference of regular expressions from sample entities

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 48 publications
(53 citation statements)
references
References 13 publications
0
53
0
Order By: Relevance
“…Each transducer consists of a set of pattern-action rules, the actions being new annotations over the matched text (see Figure 1). The rule-based methods described in Section 2.2 are in principle capable of generating some valid rules for this standard, but they usually employ only character features to generate regular expressions (Li et al 2008;Brauer et al 2011), or only token features, predefined (Soderland 1999;Thompson et al 1999) or not (Ciravegna and Wilks 2003;Nagesh and Chiticariu 2012). There are some approaches like (Wu and Pottenger 2005) that use both types of features, but they can-not be customized.…”
Section: Representation Of Patternsmentioning
confidence: 99%
“…Each transducer consists of a set of pattern-action rules, the actions being new annotations over the matched text (see Figure 1). The rule-based methods described in Section 2.2 are in principle capable of generating some valid rules for this standard, but they usually employ only character features to generate regular expressions (Li et al 2008;Brauer et al 2011), or only token features, predefined (Soderland 1999;Thompson et al 1999) or not (Ciravegna and Wilks 2003;Nagesh and Chiticariu 2012). There are some approaches like (Wu and Pottenger 2005) that use both types of features, but they can-not be customized.…”
Section: Representation Of Patternsmentioning
confidence: 99%
“…We assess our proposal on several datasets representative of possible applications of our similarity learning method (the name of each dataset describes the nature of the data and the type of the entities to be extracted): HTML-href [14,13,11], Log-MAC+IP [14,13,11], Email-Phone [14,13,11,8,7], Bills-Date [14,12], Web-URL [14,13,11,7], Twitter-URL [14,13,11]. Each dataset consists of a text annotated with all and only the snippets that should be extracted.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…Devising a similarity function capable of capturing syntactic patterns is an important problem as it may enable significant improvements in methods for constructing syntax-based entity extractors from examples automatically [4][5][6][7][8][9][10][11][12][13][14]. We are not aware of any similarity definition capable of (approximately) separating strings which adhere to a common syntactic pattern (e.g., telephone numbers, or email addresses) from strings which do not.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…GATE 6 provides the JAPE language that recognizes regular expressions over annotations. Other systems focus on reducing manual effort for developing extractors (Brauer et al, 2011;Li et al, 2011). In contrast, our tool focuses on visualizing and comparing diagnostic information associated with pattern learning systems.…”
Section: Related Workmentioning
confidence: 99%