2007
DOI: 10.1007/s10115-007-0108-0
|View full text |Cite
|
Sign up to set email alerts
|

S2S: structural-to-syntactic matching similar documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…, α m )). 1 If the j-th token in the entire token space appears in an entity e i , then α j is the TF/IDF 2 weight value of the j-th token. Otherwise, α j = 0.…”
Section: Graph Formationmentioning
confidence: 99%
See 2 more Smart Citations
“…, α m )). 1 If the j-th token in the entire token space appears in an entity e i , then α j is the TF/IDF 2 weight value of the j-th token. Otherwise, α j = 0.…”
Section: Graph Formationmentioning
confidence: 99%
“…Our problem is associated with the entity resolution (ER) problem that has been known as various names-record linkage [12,15], citation matching [25,32], identity uncertainty [32], merge/purge [19], object matching [1,8,24,39], duplicate detection [41], group linkage [28,29] and so on. Recently, as one of specialized ER problems, [23] introduced the mixed entity resolution problem in which instances of different entities Footnote 4 continued web pages and 13 true clusters in the "Davaid Mulford" name data set.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Syntactic and semantic similarity metrics used by SASMINT are listed below: (a) Syntactic similarity: Among different metrics from NLP available for comparing two strings syntactically, we have selected for SASMINT a number of well-known ones that are suitable for different types of strings. Such metrics are also used in the Information Retrieval (IR) domain for determining the document similarities, as addressed in Aygün [6], Wan [63]. Combining the results of syntactic similarity metrics using a weighted summation makes SASMINT applicable for different domains, which typically consist of schema elements in varying forms.…”
Section: Linguistic Matchingmentioning
confidence: 99%
“…This additional link enables rapid transitions when there is a failure in the pattern matching process, by which the automata can move to another trie branch that has similar prefix without the need for backtracking. Aho-Corasick algorithm has been applied to solve numerous problem such as signature-based anti-virus application [2], set matching in Bioinformatics [4], structural-to-syntactic matching for identical documents [6], searching of text strings on digital forensics [7] and text mining [8].…”
Section: Introductionmentioning
confidence: 99%