2014
DOI: 10.1007/s10618-014-0378-6
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

Abstract: Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0
6

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 24 publications
(36 citation statements)
references
References 45 publications
(46 reference statements)
0
30
0
6
Order By: Relevance
“…The methods used in the present study fall into two groups, namely those for assessing multisource variability [18] and those for assessing temporal variability. [19] The methods are based on the comparison of probability distributions of the variables among different sources or over different periods of time. The comparisons are made by calculating the information-theoretic probabilistic distances between pairs of distributions, in concrete terms, we use the Jensen-Shannon distance (JSD), a symmetrized and smoothed version of the Kullback-Leibler divergence.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The methods used in the present study fall into two groups, namely those for assessing multisource variability [18] and those for assessing temporal variability. [19] The methods are based on the comparison of probability distributions of the variables among different sources or over different periods of time. The comparisons are made by calculating the information-theoretic probabilistic distances between pairs of distributions, in concrete terms, we use the Jensen-Shannon distance (JSD), a symmetrized and smoothed version of the Kullback-Leibler divergence.…”
Section: Methodsmentioning
confidence: 99%
“…Multi-source or temporal variability, if unmanaged, may lead to inaccurate or irreproducible results [3,18,19] or even to invalid results. [11] The reuse of data in multi-site repositories for population studies, clinical trials, or data mining rests on the assumption that the data distributions are to some degree concordant irrespective of the source of data or of the time over which the data have been collected and therefore allows generalizable conclusions to be drawn from the data.…”
Section: Background and Significancementioning
confidence: 99%
See 1 more Smart Citation
“…was not certified by peer review) (which The copyright holder for this preprint this version posted September 16, 2019. . https://doi.org/10.1101/19006098 doi: medRxiv preprint A PREPRINT -SEPTEMBER 5, 2019 framework can easily be extended to allow for a more formal statistical process control (see Sáez et al (2015 and [17,19] for guidance).…”
Section: Limitationsmentioning
confidence: 99%
“…Larger search spaces, like those encountered for numerical data, (complex) multi‐relational datasets, for example, encountered in social networks, or spatiotemporal data require efficient algorithms that can handle those different types of data, e.g., Refs . Also combinations of such different data characteristics, for example, temporal pattern mining for event detection, or temporal subgroup analytics provide further challenges, especially considering sophisticated exceptional model classes in that area.…”
Section: Future Directions and Challengesmentioning
confidence: 99%