2020
DOI: 10.1093/phe/phaa006
|View full text |Cite|
|
Sign up to set email alerts
|

Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration

Abstract: Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 18 publications
0
11
0
Order By: Relevance
“…In parallel with these changes in research practice, high profile cases of data misuse have emerged, exposing research participants to privacy breaches and risk of harm ( Fuller, 2019 ). In response, debate has increased about the role and effectiveness of the Research Ethics Committee (REC) as the chief ethical research oversight mechanism in research, given the specific challenges presented by research with big data ( Ferretti et al, 2020 ; Rennie et al, 2020 ). RECs, also known as Institutional Review Boards (IRBs) and Research Ethics Boards (REBs), were created in the 20th century to protect the safety and interests of human participants in research ( Friesen et al, 2019 ).…”
Section: Introductionmentioning
confidence: 99%
“…In parallel with these changes in research practice, high profile cases of data misuse have emerged, exposing research participants to privacy breaches and risk of harm ( Fuller, 2019 ). In response, debate has increased about the role and effectiveness of the Research Ethics Committee (REC) as the chief ethical research oversight mechanism in research, given the specific challenges presented by research with big data ( Ferretti et al, 2020 ; Rennie et al, 2020 ). RECs, also known as Institutional Review Boards (IRBs) and Research Ethics Boards (REBs), were created in the 20th century to protect the safety and interests of human participants in research ( Friesen et al, 2019 ).…”
Section: Introductionmentioning
confidence: 99%
“…The health ministry press releases are collected on the Wikipedia page for Covid-Testing (see https://en.wikipedia.org/wiki/COVID-19_testing ). Webscraping press releases or metadata of public institutions to gain lacking data on public health issues has been applied to link incarceration and HIV rates [ 22 ] and to detect community autism spectrum disorder rates [ 23 ]. Further, Wikipedia has been deemed helpful for public health needs [ 24 ] and especially for modelling trends of the Coronavirus outbreak (Doğaner, 2020) [ 25 ].…”
Section: Methodsmentioning
confidence: 99%
“…However, it is worthwhile to point out that an expressed concern in the field of web scraping due to the fact that scrapers can obtain personal information and publish it to an open database [21] [23] . This becomes even more sensitive when medical records are retrieved by the scraper.…”
Section: Literature Reviewmentioning
confidence: 99%