Office applications such as OpenOffice and Microsoft Office are widely used to edit the majority of today's business documents: office documents. Usually, version control systems consider office documents as binary objects, thus severely hindering collaborative work. Since XML has become a de-facto standard for office applications, we focus on versioning office documents by structured XML version control approaches. This enables state-of-the-art version control for office documents.A basic prerequisite to XML version control is a diff algorithm, which detects structural changes between XML documents. In this paper, we evaluate state-of-the-art XML diff algorithms w.r.t. their suitability to OpenOffice XML documents and the future OASIS office document standard. It turns out that, due to the specific XML office format, a careful examination of the diff algorithm characteristics is necessary. Therefore, we identify important features for XML diff approaches to handle office documents. We have implemented a first OpenOffice versioning API that can be used in version control systems as a replacement for linebased or binary diffs, which are currently used.
XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential, as well as the merging of document versions resulting from parallel editing processes.Different approaches try to meet these challenges using operational transformation or document annotation. In both approaches, the changes are tracked during editing, which requires corresponding editing applications. In the context of software development, however, a state-based approach is common. Here, versions are compared and merged using external tools, called diff and patch. This allows the users for editing documents without being tightened to editing tools. Approaches exist that are able to compare XML documents, but lack a corresponding merge capability.In this article, we present a comprehensive framework that allows for comparing and merging of XML documents using a state-based approach. Its design is based on an analysis of XML documents and their modification patterns. The framework is built on top of a context-oriented delta model. We present a diff algorithm that appears to be highly efficient in terms of speed and delta quality. The patch algorithm is able to merge document versions efficiently and reliably. The efficiency and the reliability of our approach are verified using a competitive test scenario.
Different dialects of XML have emerged as ubiquitous document exchange formats. For effective collaboration based on such documents, the capability to propagate edit operations performed on a document is indispensable. In order to avoid the transmission of whole documents, deltas are used to describe these edit operations, allowing the construction of a new version of a document. However, patching a document with a delta it was not generated for is error-prone, and any insert or delete operations performed on the document are likely to affect all subsequent paths within that document.In this paper, we present a delta format for XML documents that uses context-aware fingerprints to identify edit operations. This allows our XML patch procedure to find the correct position of an edit operation, even if the document was updated in the meantime. Possible conflicts are detected. Experimental results show the reliability of the presented fingerprinting technique and prove the high quality of the resulting patched documents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.