2020
DOI: 10.1371/journal.pone.0228154
|View full text |Cite
|
Sign up to set email alerts
|

Is it time to stop sweeping data cleaning under the carpet? A novel algorithm for outlier management in growth data

Abstract: All data are prone to error and require data cleaning prior to analysis. An important example is longitudinal growth data, for which there are no universally agreed standard methods for identifying and removing implausible values and many existing methods have limitations that restrict their usage across different domains. A decision-making algorithm that modified or deleted growth measurements based on a combination of pre-defined cutoffs and logic rules was designed. Five data cleaning methods for growth wer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 44 publications
2
16
0
Order By: Relevance
“…Such simulation studies should mimic the types of errors that occur in datasets, otherwise results may have little generalisability when applied to datasets that have different error structures (49). Merely simulating normally distributed errors without age (date) errors, keystroke errors (29), duplications, internally inconsistent values etc is unrealistic, and such studies are unlikely to be a useful test of the performance of the method.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Such simulation studies should mimic the types of errors that occur in datasets, otherwise results may have little generalisability when applied to datasets that have different error structures (49). Merely simulating normally distributed errors without age (date) errors, keystroke errors (29), duplications, internally inconsistent values etc is unrealistic, and such studies are unlikely to be a useful test of the performance of the method.…”
Section: Discussionmentioning
confidence: 99%
“…Logic checks based on data entry error mechanisms are another screening/diagnostic method and can be cross-sectional eg; digit errors (29), or longitudinal eg; measures carried forward (9). These checks are rarely reported, which might reflect their status as a routine part of quality control, but it may also reflect a lack of systematic rigor and transparency in data handling processes.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…There is little evidence to support any standardized guidance for identifying implausible values in adults. More research is available for pediatric data, and the advantages and limitations of existing methods for children could be presented in guidelines while further advancement of these methods continues (2,(4)(5)(6)(7)(8)(9). Any guidelines regarding plausible outliers should reflect that the treatment of these values depends on the research question.…”
mentioning
confidence: 99%