2013
DOI: 10.1007/s00778-013-0332-z
|View full text |Cite
|
Sign up to set email alerts
|

Towards zero-overhead static and adaptive indexing in Hadoop

Abstract: Several research works have focused on supporting index access in MapReduce systems. These works have allowed users to significantly speed up selective MapReduce jobs by orders of magnitude. However, all these proposals require users to create indexes upfront, which might be a difficult task in certain applications (such as in scientific and social applications) where workloads are evolving or hard to predict. To overcome this problem, we propose LIAH (Lazy Indexing and Adaptivity in Hadoop), a parallel, adapt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 54 publications
(36 citation statements)
references
References 28 publications
0
36
0
Order By: Relevance
“…There are many optimisation techniques that allow increasing the performance of MapReduce based solutions (Jiang et al, 2010;Thusoo et al, 2010). For instance, one way to improve Hadoop's performance is to design an optimised storage model that utilises various indexing solutions (Richter et al, 2014). Although the use of column oriented storage formats (RCFile, 2016;ORC, 2016) dominates the design of modern Hadoop-based data warehousing solutions, we present a storage model that is both read optimised and row based.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There are many optimisation techniques that allow increasing the performance of MapReduce based solutions (Jiang et al, 2010;Thusoo et al, 2010). For instance, one way to improve Hadoop's performance is to design an optimised storage model that utilises various indexing solutions (Richter et al, 2014). Although the use of column oriented storage formats (RCFile, 2016;ORC, 2016) dominates the design of modern Hadoop-based data warehousing solutions, we present a storage model that is both read optimised and row based.…”
Section: Related Workmentioning
confidence: 99%
“…Primarily, they have been exploited by MapReduce jobs, and in the case of HAIL required decorating map tasks with customised annotations, specifying the selection predicates and the list of projected fields (Richter et al, 2014). Unfortunately, no indexing enhancements to the MapReduce interface are visible to the applications that conform to the original interface such as Hive or Spark.…”
Section: 4mentioning
confidence: 99%
“…Lazy Indexing (LIAH) is proposed by Richter, Quiané-Ruiz [18] as adaptive indexing using clustered approach. LIAH uses offer rate to minimize indexing I/O cost and creates as many indexes as suggested by incoming queries.…”
Section: Related Workmentioning
confidence: 99%
“…Nevertheless to completely index all data blocks, low offer rate will require more MapReduce jobs. Due to this fact, LIAH has to compromise either indexing overhead or number of MapReduce jobs which provides motivation towards dynamically adapting offer rate [19]. Although query workload prediction is not required and unlike static indexing there is no replication factor dependency to consider number of index attributes in both of these approaches yet performing full scan for each new query and replicating data block for each new index attribute are the performance bottlenecks of LIAH.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation