Vispedia: Interactive Visual Exploration of Wikipedia Data via Search-Based Integration

Chan, Bryan; Wu, Leslie; Talbot, Justin; Cammarano, Mike; Hanrahan, Pat

doi:10.1109/tvcg.2008.178

Cited by 33 publications

(16 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, many extraction problems are by nature ambiguous and require user input, yet mostlyautomatic systems like Sifter [10] offer the user little help when the extractors fail. Karma [4] and Mashmaker [7] can learn from positive but not negative examples. Mashmaker users must drop into a lower level pattern editor to make a pattern more selective.…”

Section: Programmer Leverages Structural Heuristicsmentioning

confidence: 99%

See 1 more Smart Citation

Attaching UI enhancements to websites with end users

Toomim

Drucker

Dontcheva

et al. 2009

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

We present reform, a system that envisions roles for both programmers and end users in enhancing existing websites to support new goals. First, programmers author a traditional mashup or browser extension, but they do not write a web scraper. Instead they use reform, which allows novice end users to attach the enhancement to their favorite sites with a scraping by-example interface. reform makes enhancements easier to program while also carrying the benefit that end users can apply the enhancements to any number of new websites. We present reform's architecture, user interface, interactive by-example extraction algorithm for novices, and evaluation, along with five example reform enabled enhancements. This is a step toward write-once, apply-anywhere user interface enhancements.

show abstract

Section: Programmer Leverages Structural Heuristicsmentioning

confidence: 99%

“…For instance, Vispedia [4] allows visualization of Wikipedia articles by leveraging the RDF predefined for each topic as part of the DBpedia project. d.mix [12] allows experts to define a library of patterns that end users employ.…”

Section: Programmer Leverages Predefined Webpage Semanticsmentioning

confidence: 99%

Attaching UI enhancements to websites with end users

Toomim

Drucker

Dontcheva

et al. 2009

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…For example, Sifter [16] extracts search items on a web page using the HTML structure and scrapes subsequent web pages by examining hyperlinks (such as "Next page") and URL parameters. Vispedia [17] extracts Wikipedia infoboxes using the table structure and uses the hyperlinks in an infobox to retrieve related topics. There are also commercial web scrapers, such as Scraper [18], a Chrome plugin for scraping similar items in web pages, and ScraperWiki [19], a commercial product that specifically targets scraping Twitter and tabular data.…”

Section: Related Workmentioning

confidence: 99%

A spreadsheet model for using web service data

Chang

Myers

2014

2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

View full text Add to dashboard Cite

Abstract-Web services offer a more reliable and efficient way to access online data than scraping web pages. However, web service data are often in complex hierarchical structures that make it difficult for people to extract the desired parts or to perform any further data manipulation without writing a significant amount of surprisingly intricate code. In this paper, we present Gneiss, a tool that extends the familiar spreadsheet metaphor to support working with data returned from web services. Gneiss allows users to extract the desired fields in web service data using drag-and-drop, and refine the results through spreadsheet formulas, along with sorting and filtering the data. Hierarchical data are stored as nested tables in the spreadsheet and can be flattened for future operations. Data flow is two-way between the spreadsheet and the web services, enabling people to easily make a new request by modifying spreadsheet cells. In addition, using the dependency between spreadsheet cells, our tool is able to create parallel-running data extractions based on the user's sequential demonstration. We use a set of examples to demonstrate our tool's ability to create fast and reusable data extraction and manipulation programs that work with complex web service data.

show abstract

“…The first allows the traversal of an RDF graph while the latter enables the analysis of aggregated data that match a query. The Vispedia project [6] is an approach to interactively visualize Wikipedia infoboxes. It allows a user to select an infobox and define a keyword query, which the system evaluates on the semantic graph of Wikipedia to extract supplemental information.…”

Section: Related Workmentioning

confidence: 99%

“…As for data profiling in general, there is a wealth of approaches that can be used to grasp a given dataset, e.g., functional dependency discovery [6], and join path exploration [11]. The Bellman project [12] integrates a set of techniques to address poorly structured and dirty data.…”

Section: Related Workmentioning

confidence: 99%

Profiling linked open data with ProLOD

Böhm

Naumann

Abedjan

et al. 2010

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

View full text Add to dashboard Cite

Abstract-Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc.To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset. I. PROFILING LINKED OPEN DATAData profiling comprises a well established set of basic operations, which analyze a (relational) dataset and create metadata that is useful to understand the data and to detect irregularities. Profiling is mostly performed in a column-by-column manner, for instance to detect frequent value patterns or the uniqueness of column values. Common profiling methods and tools have the underlying assumption of a well-defined semantics of the column and mostly regular data.These assumptions do not hold for linked open data (LOD) published on the web. Such data emerge from different sources, such as open source communities (e.g., Wikipedia) or projects dedicated to a specific topic (e.g., DrugBank [1]). These diverse origins cause a diversity of how information is expressed as data values and how these values are structured. Nevertheless, these datasets interlink each other. The overall LOD vision is to enable the generation of new knowledge based on a wealth of widely available interlinked data. However, leveraging the variety of such open data requires (i) an initial understanding of each single dataset and (ii) an overview of the available data as a whole. Only then, data analysts can focus on the required subset of LOD for the problem at hand. Classical profiling techniques are, to the best of our knowledge, not appropriate to deal with these new massive sets of open (and thus heterogeneous) data. We propose a new iterative and interactive methodology for profiling LOD. We envision a process that allows a user to divide data into groups, review simple statistics or sophisticated mining results on a group-level, and then rethink grouping decisions in order to revise them for refining the profiling result. In this paper, we report on ProLOD, an initial prototype we developed to step towards this vision. As a proof-of-concept, we concentrate on the infobox (without ontol...

show abstract

Vispedia: Interactive Visual Exploration of Wikipedia Data via Search-Based Integration

Cited by 33 publications

References 15 publications

Attaching UI enhancements to websites with end users

Attaching UI enhancements to websites with end users

A spreadsheet model for using web service data

Profiling linked open data with ProLOD

Contact Info

Product

Resources

About