2015
DOI: 10.9756/bijdm.8001
|View full text |Cite
|
Sign up to set email alerts
|

Data Integration in Big Data Environment

Abstract: Abstract---Data Integration is the process of transferring the data in source format into the destination format. Many

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 18 publications
(12 reference statements)
0
9
0
Order By: Relevance
“…In order to implement a Big Data ecosystem, the first steps are the data extraction and data loading. Here, the ETL stages come to mind, Figure 2 describes the three applied tasks in a traditional data warehousing process (Arputhamary and Arockiam, 2015). In Hadoop, ETL process becomes Extract, Load and Transform (ELT) targeting processing time reduction; however, in practice, during data loading step it was clear the existence of useless and redundant fields and transformation process improves if data is cleaned first, thus, we propose: Extract, Cleaning, Load and Transform (ECLT).…”
Section: Etl Processmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to implement a Big Data ecosystem, the first steps are the data extraction and data loading. Here, the ETL stages come to mind, Figure 2 describes the three applied tasks in a traditional data warehousing process (Arputhamary and Arockiam, 2015). In Hadoop, ETL process becomes Extract, Load and Transform (ELT) targeting processing time reduction; however, in practice, during data loading step it was clear the existence of useless and redundant fields and transformation process improves if data is cleaned first, thus, we propose: Extract, Cleaning, Load and Transform (ECLT).…”
Section: Etl Processmentioning
confidence: 99%
“…Comparing our attained results with other researches, the main difference is probably the nature of the data, since they propose to discard data at row level (Aye, 2011), and we consider important all the security log rows; for that reason, our data cleaning proposal is based only in a vertical dimension. This intuitive solution is easily implementable and gives good results, however, this proposal does not fully comply some main characteristics to be considered as a suitable solution (Arputhamary and Arockiam, 2015): Reliability, Maintainability, Freshness, Recoverability, Scalability, Availability, Traceability, Auditability and Maintenance. For instance, Scalability since the code lines for data cleaning must be created in house according the company's requirements, automatic Recoverability due to the need of the human intervention to restore the data cleaning script.…”
Section: Intuitive Proposal For Data Cleaningmentioning
confidence: 99%
See 1 more Smart Citation
“…Another element to take into consideration is the fact that the value of data increases exponentially when it is linked and fused with other data. Hence addressing the data integration challenge is critical to realizing the promise of Big Data [5], and unfortunately existing data warehousing techniques are inefficient to handle such integration [6]. Indeed, traditional data warehouses integrate structured, transactional data that is contained within relational databases.…”
Section: Introductionmentioning
confidence: 99%
“…More recent studies have confirmed that maintenance is a major cost issue, with a ratio between maintenance costs and added-value higher than 25% in some sectors (Sophie et al, 2014). In fact, data quality practices -including maintenance reports -have a considerable impact on maintenance tasks, risks and business performance since poor data quality results in losses across a number of fronts (Arputhamary & Arockiam., 2015), and reciprocally, high data quality fosters enhanced business activities and decisionmaking.…”
Section: Introductionmentioning
confidence: 99%