Advances in Exploratory Data Analysis, Visualisation and Quality for Data Centric AI Systems

Patel, Hima; Guttula, Shanmukha; Mittal, Ruhi Sharma; Manwani, Naresh; Berti‐Équille, Laure; Manatkar, Abhijit

doi:10.1145/3534678.3542604

Cited by 9 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of two case studies based on an EDA approach to data quality motivates a collection of research questions for statistics that cover theory, methodology, and software tools. Data visualization is a crucial EDA approach that uses visual elements like charts and graphs to make analysis simple and efficient [24]. When it comes to data quality profiling, visual EDA is very pertinent.…”

Section: Execution and Analysismentioning

confidence: 99%

Opportunities and Challenges in Data-Centric AI

Kumar,

Datta,

Singh

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Artificial intelligence (AI) systems are trained to solve complex problems and learn to perform specific tasks by using large volumes of data, such as prediction, classification, recognition, decision-making, etc. In the past three decades, AI research has focused mostly on the model-centric approach compared to the data-centric approach. In the model-centric approach, the focus is to improve the code or model architecture to enhance performance, whereas in data-centric AI, the focus is to improve the dataset to enhance performance. Data is food for AI. As a result, there has been a recent push in the AI community toward data-centric AI from model-centric AI. This paper provides a comprehensive and critical analysis of the current state of research in data-centric AI, presenting insights into the latest developments in this rapidly evolving field. By emphasizing the importance of data in AI, the paper identifies the key challenges and opportunities that must be addressed to improve the effectiveness of AI systems. Finally, this paper gives some recommendations for research opportunities in data-centric AI.

show abstract

Section: Execution and Analysismentioning

confidence: 99%

Opportunities and Challenges in Data-Centric AI

Kumar,

Datta,

Singh

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…The lack of digital data in Greece presents a golden opportunity to start from scratch and potentially create data-centric AI systems that prioritise data quality over quantity based on a set of data that is scalable, adaptable, and governable (Patel et al, 2022). Technologically advanced and developed economies, such as the United States, Germany, Canada, and the United Kingdom, have achieved the digitalization of health and the collection of RWD, but are now facing significant challenges in transforming their systems and making them time-efficient and accessible to AI systems.…”

Section: Golden Opportunity To Start From Scratchmentioning

confidence: 99%

Greece 2.0, Health Economics and Outcome Research and the Rise of Artificial Intelligence: Another Missed Opportunity or it's Time for Brilliance?

Fylatos

Efthymiou

Sidiropoulos

et al. 2022

jpentai

View full text Add to dashboard Cite

The EU National Recovery and Resilience Plan "Greece 2.0" includes, among other priorities, a framework to promote and reform the health system, with a focus on digitalization of health and the use of information technology applications. Greece 2.0 may offer a chance to address the current scarcity of high-quality, reliable data sources, which is limiting the spread and impact of health economics and outcomes research (HEOR). We also suspect that the use of artificial intelligence (AI) in HEOR will play an important role in Greece's health-care reform and that it will be critical for making real-world data-driven decisions, reducing policy uncertainty. Greece has a once-in-a-lifetime chance to start from scratch and potentially build data-centric AI systems that prioritise data quality over quantity and are built on scalable, flexible, and governable data collection. This commentary explains and critically considers the significance of developing and funding an innovative plan for using AI in HEOR as part of the Greece 2.0 framework. It also discusses ethical issues and the larger role of HEOR in health-care reform.

show abstract

“…These algorithms are typically evaluated on the same task for which the dataset was collected, and the learned policy can be pessimistic in out-of-distribution states and actions, leading to poor generalization in unseen downstream tasks. Recently, data-centric approaches have become emerging, emphasizing the importance of training data quality over algorithmic advances (Motamedi, Sakharnykh, and Kaldewey 2021;Patel et al 2022). To improve training data quality, researchers have explored selecting the most critical samples or re-weighting (Wu et al 2021) all samples in the offline RL algorithms.…”

Section: Introductionmentioning

confidence: 99%

CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

Sun,

Qian,

Miao

2024

AAAI

View full text Add to dashboard Cite

Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected dataset. Most existing works are to develop sophisticated learning algorithms, with less emphasis on improving the data collection process. Moreover, it is even challenging to extend the single-task setting and collect a task-agnostic dataset that allows an agent to perform multiple downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection and ultimately improve learning efficiency and capabilities for multi-task offline RL. To achieve this, CUDC estimates the probability of the k-step future states being reachable from the current states, and adapts how many steps into the future that the dynamics model should predict. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.

show abstract

Advances in Exploratory Data Analysis, Visualisation and Quality for Data Centric AI Systems

Cited by 9 publications

References 14 publications

Opportunities and Challenges in Data-Centric AI

Opportunities and Challenges in Data-Centric AI

Greece 2.0, Health Economics and Outcome Research and the Rise of Artificial Intelligence: Another Missed Opportunity or it's Time for Brilliance?

CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

Contact Info

Product

Resources

About