Run-time performance optimization of a BigData query language

Liu, Yanbin; Dube, Parijat; Gray, Scott C.

doi:10.1145/2568088.2576800

Cited by 2 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A quantidade de dados imensa tanto em escala, quanto em complexidade, escopo, distribuição e/ou heterogeneidade produzidos por tais sistemas vem sendo referenciada como big data (CUZZOCREA;DAVIS, 2011;WANG et al, 2016;DUBE;GRAY, 2014;MOISE et al, 2013). Desse modo, é cada vez mais necessário desenvolver novas técnicas para tratar de maneira eficiente e eficaz tipos de dados complexos armazenados em escalas que vão à ordem dos milhares de Petabytes -um petabyte (PB) é 2 50 bytes, o que corresponde à ordem de grandeza de 10 15 bytes.…”

Section: Abreviaturas E Siglasunclassified

“…Por exemplo, os trabalhos existentes sobre outros critérios de comparação por similaridade (k-vizinhos reversos mais próximos e diversidade) fornecem operadores de busca baseados em algoritmos polinomiais de ordem elevada ou mesmo np-completos, portanto inviáveis para tratar big data. Por outro lado, os trabalhos recentes em big data têm o foco principalmente na eficiência e buscam aprimorar o desempenho da indexação e recuperação, em geral trabalhando com arquiteturas paralelas (BORKAR; CAREY; LI, 2012; PAPADIMITRIOU; SUN, 2008;MOISE et al, 2013;FEGARAS;GUPTA, 2012;SCHADT et al, 2010;VERNICA;LI, 2010;REED, 2012;HALL et al, 2013;DUBE;GRAY, 2014;WANG et al, 2016), mas sem abordar a raiz do problema, que é o aumento da densidade do espaço de busca. De fato, apesar do rápido crescimento do volume de dados, o modelo de busca por similaridade utilizado sempre tem permanecido o mesmo, isto é, utilizam-se sempre os mesmos operadores fundamentais que consideram os dados em espaços "esparsos" (SKOPAL et al, 2009).…”

Section: Motivaçãounclassified

See 1 more Smart Citation

Similaridade em big data

Santos¹

View full text Add to dashboard Cite

The data being collected and generated nowadays increase not only in volume, but also in complexity, requiring new query operators. Health care centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of application domains where more powerful and flexible operators are required. Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being referred to as big data. Representing and querying big data using only the traditional scalar data types are not enough anymore. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now that they are starting to become available, its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only a few studies exist targeting the efficacy of the query answers. This Ph.D. work aims at developing variations for the basic similarity operators to propose better suited similarity operators to handle big data, presenting a holistic vision about the database, increasing the effectiveness of the provided answers, but without causing impact on the efficiency on the searching algorithms. To achieve this goal, four mainly contributions are presented: The first one was a result diversification model that can be applied in any comparison criteria and similarity search operator. The second one focused on defining sampling and grouping techniques with the proposed diversification model aiming at speeding up the analysis task of the result sets. The third contribution concentrated on evaluation methods for measuring the quality of diversified result sets. Finally, the last one defines an approach to integrate the concepts of visual data mining and similarity with diversity searches in content-based retrieval systems, allowing a better understanding of how the diversity property is applied in the query process.

show abstract

Section: Abreviaturas E Siglasunclassified

Section: Motivaçãounclassified

Similaridade em big data

Santos¹

View full text Add to dashboard Cite

show abstract

Query Languages in NoSQL Databases

Holanda

Souza

Advances in Data Mining and Database Management

View full text Add to dashboard Cite

This chapter aims to investigate how NoSQL (Not Only SQL) databases provide query language and data retrieval mechanisms. Users attest to many advantages in using the NoSQL databases for specific applications, however, they also report that querying and retrieving data easily continues to be a problem. The NoSQL operations require that, during the project, the queries must be thought of as built-in application codes. The authors intend to contribute to the investigation of querying, considering different types of NoSQL databases.

show abstract

Run-time performance optimization of a BigData query language

Cited by 2 publications

References 10 publications

Similaridade em big data

Similaridade em big data

Query Languages in NoSQL Databases

Contact Info

Product

Resources

About