Abstract-DescribeX is a visual, interactive tool for exploring the underlying structure of an XML collection. DescribeX implements a framework for creating XML summaries described using axis path regular expressions (abbreviated AxPRE). AxPRE's capture all the bisimilarity-based proposals in the summary literature and they can be used to define new and more expressive summaries. This demonstration shows how DescribeX helps to analyze diverse XML collections in one particular scenario: the analysis of protein-protein interaction XML data from multiple providers that conform to the PSI-MI schema.
I. OVERVIEWXML has been adopted as the standard format for numerous applications in data exchange, web-based feeds (blogs, news feeds, podcasts), hypertext collections, and web services. XML schemas are used across different application domains for validating domain-specific XML instances. Schema validation provides a strong basis from which to structure, author and interpret XML data. However, even though two XML collections can be validated against a common schema, the actual structure of the XML instances may be quite different in each of the two collections. This situation may occur because the common schema is extended to allow different user communities to combine schemas freely (e.g., RSS extensions like Yahoo! Media), or document designers may restrict themselves to use just a subset of a larger schema (e.g., best practice guidelines of industry standards like those for IXRetail 1 ). In these scenarios, schemas do not provide sufficient information for understanding the structural commonalities of a given collection.DescribeX is a visual, interactive tool for exploring the underlying structure of an XML collection, capable of handling gigabyte-size datasets. DescribeX is based on a framework (presented in [1] and [2]) for creating XML summaries based on axis path regular expressions (AxPRE, for short). DescribeX summaries are specified by a partition created using the novel notion of bisimilarity applied to subgraphs described by an AxPRE. The elements in the extent of a given partition (represented by a node in the summary) can be computed by an XPath query that is constructed by DescribeX. By employing different AxPREs to define the summary partition, DescribeX can capture all the bisimilarity-based proposals in the existing literature, plus it can also define new and more expressive summaries.The graph based visualization employed by DescribeX makes it straightforward to see the different path structures 1 http://www.nrf-arts.org/ that are present in the collection. The application of local node refinements (ie, changing an AxPRE at a given summary node to a different, more detailed AxPRE) can reveal detailed substructure variations. DescribeX functionality helps a user in quickly understanding what parts of the schema are used in practice. Further analysis to find the most common structures and substructures can then be performed in DescribeX through the application of coverage. This provides a strong indicati...