Proceedings 26th Annual International Computer Software and Applications
DOI: 10.1109/cmpsac.2002.1045051
|View full text |Cite
|
Sign up to set email alerts
|

An approach to identify duplicated web pages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
52
0

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 73 publications
(52 citation statements)
references
References 18 publications
0
52
0
Order By: Relevance
“…Levenshtein is a common technique used in [25,26,30] to detect cloned web pages. In [26] a web page is considered to be a set of predefined features (e.g., sequence of tags or ASP features). The work in [27] applies similarity comparison using Levenshtein distance in content and scripting code levels.…”
Section: Related Workmentioning
confidence: 99%
“…Levenshtein is a common technique used in [25,26,30] to detect cloned web pages. In [26] a web page is considered to be a set of predefined features (e.g., sequence of tags or ASP features). The work in [27] applies similarity comparison using Levenshtein distance in content and scripting code levels.…”
Section: Related Workmentioning
confidence: 99%
“…To this aim, Di Lucca et al [9] proposed an approach and a tool, named WARE, to recover WA's documentation represented by UML diagrams (see Section 3.1). In particular, Di Lucca et al [6] [7] applied the tool to abstract use case diagrams, sequence diagrams and business object models from WAs. The proposed approach relies on static information, which may not suffice for an effective and complete abstraction of UML diagrams, due to the dynamic nature of some WA components.…”
Section: Related Workmentioning
confidence: 99%
“…A client page, and thus a BCP, can be considered as composed by two main components [7]: a control component, i.e., the set of items -such as the HTML code and scripts -determining the page layout, business rule processing, and event management; and a data component, i.e., the set of items -such as text, images, multimedia objects -determining the information to be read/displayed from/to a user.…”
Section: Identifying Groups Of Equivalent Built Client Pagesmentioning
confidence: 99%
See 1 more Smart Citation
“…There exist an abundance of clone detection algorithms ranging from lexical (tokenbased) [34,48,54], through AST-based [15,61,97] to metric-based [60,65,66,72] approaches. These methods act on one particular version of the software and then a detailed list of copied code segments is provided that may eventually contain several thousand items in the case of a real-size software package.…”
Section: Clone Detectionmentioning
confidence: 99%