Proceedings of the 2013 ACM Symposium on Document Engineering 2013
DOI: 10.1145/2494266.2494319
|View full text |Cite
|
Sign up to set email alerts
|

Recognising document components in XML-based academic articles

Abstract: Recognising textual structures (paragraphs, sections, etc.) provides abstract and more general mechanisms for describing documents independent of the particular semantics of specific markup schemas, tools and presentation stylesheets. In this paper we propose an algorithm that allows us to identify the structural role of each element in a set of homogeneous scientific articles stored as XML files.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 9 publications
0
16
0
Order By: Relevance
“…Also, thanks to the regularity they provide, it is possible to perform easily complex operations on pattern-based documents even when knowing very little about their vocabulary (automatic visualisation of document, inferences on the document structure, etc.). In this way, designers can implement more reliable and efficient tools, can make a hypothesis regarding the meanings of the document fragments, can identify singularities and can study the global properties of a set of documents, as described in Di Iorio et al (2012) and Di Iorio et al (2013).…”
Section: Theoretical Foundations: Structural Patternsmentioning
confidence: 99%
“…Also, thanks to the regularity they provide, it is possible to perform easily complex operations on pattern-based documents even when knowing very little about their vocabulary (automatic visualisation of document, inferences on the document structure, etc.). In this way, designers can implement more reliable and efficient tools, can make a hypothesis regarding the meanings of the document fragments, can identify singularities and can study the global properties of a set of documents, as described in Di Iorio et al (2012) and Di Iorio et al (2013).…”
Section: Theoretical Foundations: Structural Patternsmentioning
confidence: 99%
“…The creation of DoCO was undertaken by studying different corpora of documents (mainly scientific literature and web documents on different topics) and publishers' guidelines, from two perspectives -the structural and the rhetorical -as was also done by past works on document patterns [13][14][15]. We also undertook some informal interviews with researchers in different fields and with academic publishers, in order to gather as much information as possible about document components and their use.…”
Section: Document Componentsmentioning
confidence: 99%
“…The regularity of pattern-based documents (defined by means of markup languages such as DocBook or LaTeX) then makes it possible to perform complex operations easily, even when knowing very little about the documents' markup vocabulary. This in turn enables designers to implement more reliable and efficient tools [14], make hypotheses regarding the meanings of document fragments [15], identify special cases, and study global properties of sets of documents [13].…”
Section: Structural Foundation: Structural Patternsmentioning
confidence: 99%
See 2 more Smart Citations