2021
DOI: 10.1145/3465740
|View full text |Cite
|
Sign up to set email alerts
|

Stream Data Cleaning under Speed and Acceleration Constraints

Abstract: Stream data are often dirty, for example, owing to unreliable sensor reading or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum modification rule in data cleaning. To capture the knowledge about what is clean , we consider the (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(51 citation statements)
references
References 31 publications
0
48
0
Order By: Relevance
“…DEFINITION Note that, instead of setting πœ– π‘Œ .π‘šπ‘–π‘› to 0 in [19,41], in this paper, we relax this limitation to let the πœ– π‘Œ .π‘šπ‘–π‘› be any non-negative value less than πœ– π‘Œ .π‘šπ‘Žπ‘₯ (i.e., 0 ≀ πœ– π‘Œ .π‘šπ‘–π‘› < πœ– π‘Œ .π‘šπ‘Žπ‘₯), such that the CDD rule can have tighter intervals for distance constraints. CDD Rule Detection: We assume that a static data repository 𝑅 is available, which can be collected/inferred by historical stream data [23,37,38,44]. Following the literature [19,41], to infer a CDD rule in the form 𝑋 β†’ 𝐴 𝑗 from 𝑅, we first obtain determinant attributes 𝑋 from (𝑑-1) attributes (other than 𝐴 𝑗 ), where attributes 𝑋 are correlated with 𝐴 𝑗 in 𝑅.…”
Section: Imputation Over Incomplete Data Streammentioning
confidence: 99%
See 1 more Smart Citation
“…DEFINITION Note that, instead of setting πœ– π‘Œ .π‘šπ‘–π‘› to 0 in [19,41], in this paper, we relax this limitation to let the πœ– π‘Œ .π‘šπ‘–π‘› be any non-negative value less than πœ– π‘Œ .π‘šπ‘Žπ‘₯ (i.e., 0 ≀ πœ– π‘Œ .π‘šπ‘–π‘› < πœ– π‘Œ .π‘šπ‘Žπ‘₯), such that the CDD rule can have tighter intervals for distance constraints. CDD Rule Detection: We assume that a static data repository 𝑅 is available, which can be collected/inferred by historical stream data [23,37,38,44]. Following the literature [19,41], to infer a CDD rule in the form 𝑋 β†’ 𝐴 𝑗 from 𝑅, we first obtain determinant attributes 𝑋 from (𝑑-1) attributes (other than 𝐴 𝑗 ), where attributes 𝑋 are correlated with 𝐴 𝑗 in 𝑅.…”
Section: Imputation Over Incomplete Data Streammentioning
confidence: 99%
“…In this paper, we consider the missing at random (MAR) model [15] for incomplete data. Under the MAR model, we can classify the existing imputation methods of incomplete data into categories such as statistical-based [23], rule-based [12], constraint-based [38,44], and pattern-based [22] imputation methods. Due to textual property and sparseness of ER data sets, these works may fail to impute incomplete data, when there are only a few or even no samples for imputing missing attributes.…”
Section: Related Workmentioning
confidence: 99%
“…As shown in Table II, the 4-Type constraints embody the dependence on attributes (columns) and entities (rows) for temporal data. T-3: SD, SC [11], T-4: Similarity Constraints T-3: Variance Constraints [12] points in sequence as the simple instance of Type-1 constraints, CFD for relational data and Physical Mechanism for industrial data are concluded as multi-sequence constraints. Constraints, such as SD, SC, and VC, formalizing the dependence of data points along the time in one sequence belongs to Type-3 constraints.…”
Section: A Constraint-based Anomaly Detectionmentioning
confidence: 99%
“…Ihab F. Ilyas and Xu Chu give an overview of the end-to-end data cleaning process including error detection and repair methods in [10]. Both statistical-based [27], [28] and constraints-based [11], [29] cleaning are widely applied in temporal date quality improvement. [29] extends the idea of constraints from dependencies defined on relational database (e.g., FD, CFD in [30]), and proposes sequential dependencies (SD) to describe the semantics of temporal data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation