Natalia Vanetik scite author profile

The concept of support is central to data mining. While the definition of support in transaction databases is intuitive and simple, that is not the case in graph datasets and databases. Most mining algorithms require the support of a pattern to be no greater than that of its subpatterns, a property called anti-monotonicity, or admissibility. This paper examines the requirements for admissibility of a support measure. Support measures for mining graphs are usually based on the notion of an instance graph-a graph representing all the instances of the pattern in a database and their intersection properties. Necessary and sufficient conditions for support measure admissibility, based on operations on instance graphs, are developed and proved. The sufficient conditions are used to prove admissibility of one support measurethe size of the independent set in the instance graph. Conversely, the necessary conditions are used to quickly show that some other support measures, such as weighted count of instances, are not admissible.

show abstract

Discovering Frequent Graph Patterns Using Disjoint Paths

Gudes

Shimony

Vanetik

2006

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data mining, the issue is frequent labels and common specific topologies. Here, the structure of the data is just as important as its content. We study the problem of discovering typical patterns of graph data, a task made difficult because of the complexity of required subtasks, especially subgraph isomorphism. In this paper, we propose a new Apriori-based algorithm for mining graph data, where the basic building blocks are relatively large, disjoint paths. The algorithm is proven to be sound and complete. Empirical evidence shows practical advantages of our approach for certain categories of graphs.

show abstract

Query-based summarization using MDL principle

Litvak¹,

Vanetik²

2017

View full text Add to dashboard Cite

Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005(DUC, 2006. It competes with the best results.

show abstract

MUSEEC: A Multilingual Text Summarization Tool

Litvak¹,

Vanetik²,

Churkin³

2016

View full text Add to dashboard Cite

The MUSEEC (MUltilingual SEntence Extraction and Compression) summarization tool implements several extractive summarization techniques-at the level of complete and compressed sentences-that can be applied, with some minor adaptations, to documents in multiple languages. The current version of MUSEEC provides the following summarization methods: (1) MUSE-a supervised summarizer, based on a genetic algorithm (GA), that ranks document sentences and extracts top-ranking sentences into a summary, (2) POLY-an unsupervised summarizer, based on linear programming (LP), that selects the best extract of document sentences, and (3) WECOM-an unsupervised extension of POLY that compiles a document summary from compressed sentences. In this paper, we provide an overview of MUSEEC methods and its architecture in general.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Natalia Vanetik

Computing frequent graph patterns from semistructured data

Support measures for graph data*

Discovering Frequent Graph Patterns Using Disjoint Paths

Query-based summarization using MDL principle

MUSEEC: A Multilingual Text Summarization Tool

Contact Info

Product

Resources

About