Nicolás Poggi scite author profile

Nicolás Poggi

5Publications

106Citation Statements Received

75Citation Statements Given

How they've been cited

178

104

How they cite others

Affiliations

Vibrant Data (United States), Barcelona Supercomputing Center, Universitat Politècnica de Catalunya

Publications

Order By: Most citations

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Poggi

Carrera

Call

et al. 2014

View full text Add to dashboard Cite

Abstract-This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a long-term collaborative engagement between BSC and Microsoft which, over the past 6 years has explored a range of different aspects of computing systems, software technologies and performance profiling. While during the last 5 years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects its performance. Early ALOJA results show that Hadoop's runtime performance, and therefore its price, are critically affected by relatively simple software and hardware configuration choices e.g., number of mappers, compression, or volume configuration. Project ALOJA presents a vendor-neutral repository featuring over 5000 Hadoop runs, a test bed, and tools to evaluate the cost-effectiveness of different hardware, parameter tuning, and Cloud services for Hadoop. As few organizations have the time or performance profiling expertise, we expect our growing repository will benefit Hadoop customers to meet their Big Data application needs. ALOJA seeks to provide both knowledge and an online service to with which users make better informed configuration choices for their Hadoop compute infrastructure whether this be on-premise or cloud-based.The initial version of ALOJA's Web application and sources are available at http://hadoop.bsc.es

show abstract

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

Poggi

Moreno

Berral

et al.

View full text Add to dashboard Cite

Abstract. In the Web environment, user identification is becoming a major challenge for admission control systems on high traffic sites. When a web server is overloaded there is a significant loss of throughput when we compare finished sessions and the number of responses per second; longer sessions are usually the ones ending in sales but also the most sensitive to load failures. Session-based admission control systems maintain a high QoS for a limited number of sessions, but does not maximize revenue as it treats all non-logged sessions the same. We present a novel method for learning to assign priorities to sessions according to the revenue that will generate. For this, we use traditional machine learning techniques and Markov-chain models. We are able to train a system to estimate the probability of the user's purchasing intentions according to its early navigation clicks and other static information. The predictions can be used by admission control systems to prioritize sessions or deny them if no resources are available, thus improving sales throughput per unit of time for a given infrastructure. We test our approach on access logs obtained from a high-traffic online travel agency, with promising results.

show abstract

A methodology for the evaluation of high response time on E-commerce users and sales

et al. 2012

View full text Add to dashboard Cite

Reducing wasted resources to help achieve green data centers

Torres

Carrera

Hogan

et al. 2008

View full text Add to dashboard Cite

In this paper we introduce a new approach to the consolidation strategy of a data center that allows an important reduction in the amount of active nodes required to process a heterogeneous workload without degrading the offered service level. This article reflects and demonstrates that consolidation of dynamic workloads does not end with virtualization. If energy-efficiency is pursued, the workloads can be consolidated even more using two techniques, memory compression and request discrimination, which were separately studied and validated in previous work and are now to be combined in a joint effort. We evaluate the approach using a representative workload scenario composed of numerical applications and a real workload obtained from a top national travel website. Our results indicate that an important improvement can be achieved using 20% less servers to do the same work. We believe that this serves as an illustrative example of a new way of management: tailoring the resources to meet high level energy efficiency goals.

show abstract

Business Process Mining from E-Commerce Web Logs

Poggi

Muthusamy²,

Carrera

et al. 2013

View full text Add to dashboard Cite

Abstract. The dynamic nature of the Web and its increasing importance as an economic platform create the need of new methods and tools for business efficiency. Current Web analytic tools do not provide the necessary abstracted view of the underlying customer processes and critical paths of site visitor behavior. Such information can offer insights for businesses to react effectively and efficiently. We propose applying Business Process Management (BPM) methodologies to e-commerce Website logs, and present the challenges, results and potential benefits of such an approach.We use the Business Process Insight (BPI) platform, a collaborative process intelligence toolset that implements the discovery of looselycoupled processes, and includes novel process mining techniques suitable for the Web. Experiments are performed on custom click-stream logs from a large online travel and booking agency. We first compare Web clicks and BPM events, and then present a methodology to classify and transform URLs into events. We evaluate traditional and custom process mining algorithms to extract business models from real-life Web data. The resulting models present an abstracted view of the relation between pages, exit points, and critical paths taken by customers. Such models show important improvements and aid high-level decision making and optimization of e-commerce sites compared to current state-of-art Web analytics.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nicolás Poggi

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

A methodology for the evaluation of high response time on E-commerce users and sales

Reducing wasted resources to help achieve green data centers

Business Process Mining from E-Commerce Web Logs

Contact Info

Product

Resources

About