César Piñeiro scite author profile

Motivation FastTree-2 is one of the most successful tools for inferring large phylogenies. With speed at the core of its design, there are still important issues in the FastTree-2 implementation that harm its performance and scalability. To deal with these limitations we introduce VeryFastTree, a highly-tuned implementation of the FastTree-2 tool that takes advantage of parallelization and vectorization strategies to boost performance. Results VeryFastTree is able to construct a tree on a standard server using double precision arithmetic from an ultra-large 330k alignment in only 4.5 hours, which is 7.8× and 3.5× faster than the sequential and best parallel FastTree-2 times, respectively. Availability VeryFastTree is available at the GitHub repository: https://github.com/citiususc/veryfasttree Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Ignis: An efficient and scalable multi-language Big Data framework

Piñeiro

Martínez-Castaño

Pichel

2020

Future Generation Computer Systems

View full text Add to dashboard Cite

Fast Distributed k NN Graph Construction Using Auto-tuned Locality-sensitive Hashing

Eiras-Franco

Martínez-Rego

Kanthan

et al. 2020

ACM Trans. Intell. Syst. Technol.

View full text Add to dashboard Cite

The k -nearest-neighbors ( k NN) graph is a popular and powerful data structure that is used in various areas of Data Science, but the high computational cost of obtaining it hinders its use on large datasets. Approximate solutions have been described in the literature using diverse techniques, among which Locality-sensitive Hashing (LSH) is a promising alternative that still has unsolved problems. We present Variable Resolution Locality-sensitive Hashing, an algorithm that addresses these problems to obtain an approximate k NN graph at a significantly reduced computational cost. Its usability is greatly enhanced by its capacity to automatically find adequate hyperparameter values, a common hindrance to LSH-based methods. Moreover, we provide an implementation in the distributed computing framework Apache Spark that takes advantage of the structure of the algorithm to efficiently distribute the computational load across multiple machines, enabling practitioners to apply this solution to very large datasets. Experimental results show that our method offers significant improvements over the state-of-the-art in the field and shows very good scalability as more machines are added to the computation.

show abstract

A unified framework to improve the interoperability between HPC and Big Data languages and programming models

Piñeiro

Pichel

2022

Future Generation Computer Systems

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

César Piñeiro

LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction

Very Fast Tree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies

Ignis: An efficient and scalable multi-language Big Data framework

Fast Distributed k NN Graph Construction Using Auto-tuned Locality-sensitive Hashing

A unified framework to improve the interoperability between HPC and Big Data languages and programming models

Contact Info

Product

Resources

About