Predictive analytics embraces an extensive range of techniques including statistical modeling, machine learning, and data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline. Primary uses have been in data cleaning, exploratory analysis, and diagnostics. For example, scatterplots and bar charts are used to illustrate class distributions and responses. More recently, extensive visual analytics systems for feature selection, incremental learning, and various prediction tasks have been proposed to support the growing use of complex models, agent‐specific optimization, and comprehensive model comparison and result exploration. Such work is being driven by advances in interactive machine learning and the desire of end‐users to understand and engage with the modeling process. In this state‐of‐the‐art report, we catalogue recent advances in the visualization community for supporting predictive analytics. First, we define the scope of predictive analytics discussed in this article and describe how visual analytics can support predictive analytics tasks in a predictive visual analytics (PVA) pipeline. We then survey the literature and categorize the research with respect to the proposed PVA pipeline. Systems and techniques are evaluated in terms of their supported interactions, and interactions specific to predictive analytics are discussed. We end this report with a discussion of challenges and opportunities for future research in predictive visual analytics.
In 2015, the top 10 largest amusement park corporations saw a combined annual attendance of over 400 million visitors. Daily average attendance in some of the most popular theme parks in the world can average 44,000 visitors per day. These visitors ride attractions, shop for souvenirs, and dine at local establishments; however, a critical component of their visit is the overall park experience. This experience depends on the wait time for rides, the crowd flow in the park, and various other factors linked to the crowd dynamics and human behavior. As such, better insight into visitor behavior can help theme parks devise competitive strategies for improved customer experience. Research into the use of attractions, facilities, and exhibits can be studied, and as behavior profiles emerge, park operators can also identify anomalous behaviors of visitors which can improve safety and operations. In this article, we present a visual analytics framework for analyzing crowd dynamics in theme parks. Our proposed framework is designed to support behavioral analysis by summarizing patterns and detecting anomalies. We provide methodologies to link visitor movement data, communication data, and park infrastructure data. This combination of data sources enables a semantic analysis of who , what , when , and where , enabling analysts to explore visitor-visitor interactions and visitor-infrastructure interactions. Analysts can identify behaviors at the macro level through semantic trajectory clustering views for group behavior dynamics, as well as at the micro level using trajectory traces and a novel visitor network analysis view. We demonstrate the efficacy of our framework through two case studies of simulated theme park visitors.
In modern Machine Learning, model training is an iterative, experimental process that can consume enormous computation resources and developer time. To aid in that process, experienced model developers log and visualize program variables during training runs. Exhaustive logging of all variables is infeasible, so developers are left to choose between slowing down training via extensive conservative logging, or letting training run fast via minimalist optimistic logging that may omit key information. As a compromise, optimistic logging can be accompanied by program checkpoints; this allows developers to add log statements post-hoc, and "replay" desired log statements from checkpoint---a process we refer to as hindsight logging. Unfortunately, hindsight logging raises tricky problems in data management and software engineering. Done poorly, hindsight logging can waste resources and generate technical debt embodied in multiple variants of training code. In this paper, we present methodologies for efficient and effective logging practices for model training, with a focus on techniques for hindsight logging. Our goal is for experienced model developers to learn and adopt these practices. To make this easier, we provide an open-source suite of tools for Fast Low-Overhead Recovery (flor) that embodies our design across three tasks: (i) efficient background logging in Python, (ii) adaptive periodic checkpointing, and (iii) an instrumentation library that codifies hindsight logging for efficient and automatic record-replay of model-training. Model developers can use each flor tool separately as they see fit, or they can use flor in hands-free mode, entrusting it to instrument their code end-to-end for efficient record-replay. Our solutions leverage techniques from physiological transaction logs and recovery in database systems. Evaluations on modern ML benchmarks demonstrate that flor can produce fast checkpointing with small user-specifiable overheads (e.g. 7%), and still provide hindsight log replay times orders of magnitude faster than restarting training from scratch.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.