Finding all the occurrences of a twig pattern specified by a selection predicate on multiple elements in an XML document is a core operation for efficient evaluation of XML queries. Holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves ancestordescendant relationships. In this paper, we address the problem of efficient processing of holistic twig joins on all/partly indexed XML documents. In particular, we propose an algorithm that utilizes available indices on element sets. While it can be shown analytically that the proposed algorithm is as efficient as the existing state-of-the-art algorithms in terms of worst case I/O and CPU cost, experimental results on various datasets indicate that the proposed index-based algorithm performs significantly better than the existing ones, especially when binary structural joins in the twig pattern have varying join selectivities.
Most of the previous studies on mining association rules are on mining intratransaction associations, i.e., the associations among items within the same transaction where the notion of the transaction could be the items bought by the same customer, the events happened on the same day, etc. In this study, we break the barrier of transactions and extend the scope of mining association rules from traditional single-dimensional, intratransaction associations to multidimensional, intertransaction associations. An intertransaction association describes the association relationships among different transactions. In a database of stock price information, an example of such an association is "if (company) A's stock goes up on day one, B's stock will go down on day two but go up on day four." In this case, no matter whether we treat company or day as the unit of transaction, the associated items belong to different transactions. Moreover, such an intertransaction association can be extended to associate multiple properties in the same rule, so that multidimensional intertransaction associations can also be defined and discovered. Mining intertransaction associations pose more challenges on efficient processing than mining intratransaction associations because the number of potential association rules becomes extremely large after the boundary of transactions is broken. In this study, we introduce the notion of intertransaction association rule, define its measurements: support and confidence, and develop an efficient algorithm, FITI (an acronym for "First Intra Then Inter"), for mining intertransaction associations, which adopts two major ideas: 1) an intertransaction frequent itemset contains only the frequent itemsets of its corresponding intratransaction counterpart; and 2) a special data structure is built among intratransaction frequent itemsets for efficient mining of intertransaction frequent itemsets. We compare FITI with EH-Apriori, the best algorithm in our previous proposal, and demonstrate a substantial performance gain of FITI over EH-Apriori. Further extensions of the method and its implications are also discussed in the paper.
Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. In this paper, we study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of ǫN . We developed a space efficient algorithm for pre-defined N that requires only one scan of the input data stream and O( log(ǫ 2 N ) ǫ + 1 ǫ 2 ) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n ≤ N ) can be answered with a guaranteed precision of ǫn. The worst case space requirement for this algorithm is only O( log 2 (ǫN ) ǫ 2). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
In this paper, we extend the scope of mining association rules from traditional single-dimensional intratransaction associations, to multidimensional intertransaction associations. Intratransaction associations are the associations among items with the same transaction , where the notion of the transaction could be the items bought by the same customer , the events happened on the same day , and so on. However, an intertransaction association describes the association relationships among different transactions , such as “if(company) A's stock goes up on day 1, B's stock will go down on day 2, but go up on day 4.” In this case, whether we treat company or day as the unit of transaction, the associated items belong to different transactions. Moreover, such an intertransaction association can be extended to associate multiple contextual properties in the same rule, so that multidimensional intertransaction associations can be defined and discovered. A two-dimensional intertransaction association rule example is “After McDonald and Burger King open branches, KFC will open a branch two months later and one mile away,” which involves two dimensions: time and space . Mining intertransaction associations poses more challenges on efficient processing than mining intratransaction associations. Interestingly, intratransaction association can be treated as a special case of intertransaction association from both a conceptual and algorithmic point of view. In this study, we introduce the notion of multidimensional intertransaction association rules, study their measurements— support and confidence—and develop algorithms for mining intertransaction associations by extension of Apriori. We overview our experience using the algorithms on both real-life and synthetic data sets. Further extensions of multidimensional intertransaction association rules and potential applications are also discussed.
An XML twig query, represented as a labeled tree, is essentially a complex selection predicate on both structure and content of an XML document. Twig query matching has been identified as a core operation in querying treestructured XML data. A number of algorithms have been proposed recently to process a twig query holistically. Those algorithms, however, only deal with twig queries without OR-predicates. A straightforward approach that first decomposes a twig query with OR-predicates into multiple twig queries without OR-predicates and then combines their results is obviously not optimal in most cases. In this paper, we study novel holistic-processing algorithms for twig queries with OR-predicates without decomposition. In particular, we present a merge-based algorithm for sorted XML data and an index-based algorithm for indexed XML data. We show that holistic processing is much more efficient than the decomposition approach. Furthermore, we show that using indexes can significantly improve the performance for matching twig queries with OR-predicates, especially when the queries have large inputs but relatively small outputs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.