Marisa Thoma scite author profile

Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows.Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset.On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining.In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

show abstract

Discriminative frequent subgraph mining with optimality guarantees

Thoma

Cheng

Gretton

et al. 2010

Statistical Analysis

View full text Add to dashboard Cite

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

show abstract

Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning

Emrich

Graf

Kriegel

et al. 2010

View full text Add to dashboard Cite

Abstract. Many applications require to determine the k-nearest neighbors for multiple query points simultaneously. This task is known as all-(k)-nearest-neighbor (AkNN) query. In this paper, we suggest a new method for efficient AkNN query processing which is based on spherical approximations for indexing and query set representation. In this setting, we propose trigonometric pruning which enables a significant decrease of the remaining search space for a query. Employing this new pruning method, we considerably speed up AkNN queries.

show abstract

CT slice localization via instance-based regression

Emrich

Graf

Kriegel

et al. 2010

View full text Add to dashboard Cite

Automatically determining the relative position of a single CT slice within a full body scan provides several useful functionalities. For example, it is possible to validate DICOM meta-data information. Furthermore, knowing the relative position in a scan allows the efficient retrieval of similar slices from the same body region in other volume scans. Finally, the relative position is often an important information for a non-expert user having only access to a single CT slice of a scan. In this paper, we determine the relative position of single CT slices via instance-based regression without using any meta data. Each slice of a volume set is represented by several types of feature information that is computed from a sequence of image conversions and edge detection routines on rectangular subregions of the slices. Our new method is independent from the settings of the CT scanner and provides an average localization error of less than 4.5 cm using leave-one-out validation on a dataset of 34 annotated volume scans. Thus, we demonstrate that instance-based regression is a suitable tool for mapping single slices to a standardized coordinate system and that our algorithm is competitive to other volume-based approaches with respect to runtime and prediction quality, even though only a fraction of the input information is required in comparison to other approaches.

show abstract

Combined semantic and similarity search in medical image databases

Seifert

Thoma

Stegmaier

et al. 2011

View full text Add to dashboard Cite

The current diagnostic process at hospitals is mainly based on reviewing and comparing images coming from multiple time points and modalities in order to monitor disease progression over a period of time. However, for ambiguous cases the radiologist deeply relies on reference literature or second opinion. Although there is a vast amount of acquired images stored in PACS systems which could be reused for decision support, these data sets suffer from weak search capabilities. Thus, we present a search methodology which enables the physician to fulfill intelligent search scenarios on medical image databases combining ontology-based semantic and appearance-based similarity search. It enabled the elimination of 12 % of the top ten hits which would arise without taking the semantic context into account.

show abstract

Region of Interest Queries in CT Scans

Cavallaro

Graf

Kriegel

et al. 2011

View full text Add to dashboard Cite

On the impact of flash SSDs on spatial indexing

Emrich

Graf

Kriegel

et al. 2010

View full text Add to dashboard Cite

Similarity queries are an important query type in multimedia databases. To implement these types of queries, database systems often use spatial index structures like the R*-Tree. However, the majority of performance evaluations for spatial index structures rely on a conventional background storage layer based on conventional hard drives. Since newer devices like solid-state-disks (SSD) have a completely different performance characteristic, it is an interesting question how far existing index structures profit from these modern storage devices. In this paper, we therefore examine the performance behaviour of the R*-Tree on an SSD compared to a conventional hard drive. Testing various influencing factors like system load, dimensionality and page size of the index our evaluation leads to interesting insights into the performance of spatial index structures on modern background storage layers.

show abstract

Similarity Estimation Using Bayes Ensembles

Emrich

Graf

Kriegel

et al. 2010

View full text Add to dashboard Cite

Abstract. Similarity search and data mining often rely on distance or similarity functions in order to provide meaningful results and semantically meaningful patterns. However, standard distance measures like Lp-norms are often not capable to accurately mirror the expected similarity between two objects. To bridge the so-called semantic gap between feature representation and object similarity, the distance function has to be adjusted to the current application context or user. In this paper, we propose a new probabilistic framework for estimating a similarity value based on a Bayesian setting. In our framework, distance comparisons are modeled based on distribution functions on the difference vectors. To combine these functions, a similarity score is computed by an Ensemble of weak Bayesian learners for each dimension in the feature space. To find independent dimensions of maximum meaning, we apply a space transformation based on eigenvalue decomposition. In our experiments, we demonstrate that our new method shows promising results compared to related Mahalanobis learners on several test data sets w.r.t. nearestneighbor classification and precision-recall-graphs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marisa Thoma

Near-optimal supervised feature selection among frequent subgraphs

Discriminative frequent subgraph mining with optimality guarantees

Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning

CT slice localization via instance-based regression

Combined semantic and similarity search in medical image databases

Region of Interest Queries in CT Scans

On the impact of flash SSDs on spatial indexing

Similarity Estimation Using Bayes Ensembles

Contact Info

Product

Resources

About