2022
DOI: 10.48550/arxiv.2202.13028
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Healthsheet: Development of a Transparency Artifact for Health Datasets

Abstract: Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical issues surrounding the use of ML in healthcare stem from structural inequalities underlying the way we collect, use, and handle data. Developing guidelines to improve documentation practices regarding the creation, use, and maintenance of ML healthcare datasets is therefore of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 18 publications
0
1
0
Order By: Relevance
“…We outline opportunities for future research into frameworks for the systematic identification and mitigation of downstream harms and impacts of LLMs in healthcare contexts. Key principles include the use of participatory methods to design contextualized evaluations that reflect the values of patients that may benefit or be harmed, grounding the evaluation in one or more specific downstream clinical use cases 39,40 , and the use of dataset and model documentation frameworks for transparent reporting of choices and assumptions made during data collection and curation, model development and evaluation [41][42][43] . Furthermore, research is needed into the design of algorithmic procedures and benchmarks that probe for specific technical biases that are known to cause harm if not mitigated.…”
Section: Fairness and Equity Considerationsmentioning
confidence: 99%
“…We outline opportunities for future research into frameworks for the systematic identification and mitigation of downstream harms and impacts of LLMs in healthcare contexts. Key principles include the use of participatory methods to design contextualized evaluations that reflect the values of patients that may benefit or be harmed, grounding the evaluation in one or more specific downstream clinical use cases 39,40 , and the use of dataset and model documentation frameworks for transparent reporting of choices and assumptions made during data collection and curation, model development and evaluation [41][42][43] . Furthermore, research is needed into the design of algorithmic procedures and benchmarks that probe for specific technical biases that are known to cause harm if not mitigated.…”
Section: Fairness and Equity Considerationsmentioning
confidence: 99%
“…Second, the reviewed publications enable the documentation of information on the data selection, data version and data collection of the training data. Guidelines and checklists from different fields empower researchers to track data collection and selection rationale (Artrith et al, 2021;Bender and Friedman, 2018;Hutchinson et al, 2021;Isdahl and Gundersen, 2019;Rostamzadeh et al, 2022;Rule et al, 2019;Vasey et al, 2022;Walsh et al, 2021). Documentation guidelines also offer methods for recording how and when data was collected (Artrith et al, 2021;Gebru et al, 2021;Hutchinson et al, 2021;Norgeot et al, 2020;Srinivasan et al, 2021).…”
Section: Figure 1 Structure Of the Resultsmentioning
confidence: 99%
“…First, literature describes tools and methods for documenting data set size and composition. Guidelines help researchers with providing this information via summary statistics and visualizations (Gebru et al, 2021;Holland et al, 2018;Isdahl and Gundersen, 2019;Mitchell et al, 2019;Mora-Cantallops et al, 2021;Rostamzadeh et al, 2022;Schelter et al, 2017). In addition,…”
Section: Documenting the Training Datamentioning
confidence: 99%
“…Additionally, a single de-identified file containing all annotated entities for ophthalmic medications was included. To promote transparency in our data collection methods and intended uses for this data, we have provided a HealthSheet, a structured datasheet specific to healthcare datasets as recommended by Rostamzadeh et al 31 based on the original datasheet by Gebru et al 32 which was developed for open-datasets for all use cases in AI. This datasheet is provided in the Supplementary Materials .…”
Section: Resultsmentioning
confidence: 99%