T he quest for efficient data acquisition, processing and consumption methodologies has been a topic of critical interest for decades across enterprise and academic institutions alike. In order for organizations to remain agile, novel insights must be derived and decisions made in a relatively short amount of time -often in light of limited observations and sparse data. With the relatively inexpensive cost of computational power and data storage and with a society more interconnected through powerful mobile devices, increased access to invaluable data is beginning to be realized. However, an unintended consequence to these developments is managing and processing massive amounts of data securely and efficiently. These challenges are computationally and storage intensive in nature and are further complicated by an increasing emphasis on fault tolerance, redundancy and scalability. As a popular example, Facebook must continuously deal with the demand of hosting a massive number of social interactions per day, resulting in 500 TB of data. In order for Facebook to suggest relevant content to its projected 1.11 billion members in a timely manner based on past activity, computational resources must be seamlessly incorporated into its infrastructure to address increasing usage demands and a growing user base.
Big Data OverviewThe term big data has become a buzzword in the field of information technology. It epitomizes the challenges just described in the broadest of terms. In the past, solutions to such challenges have focused on massive dedicated mainframe computers, distributed computational grids and, more recently, so-called cloud services. The aim of these solutions has been to distribute a computationally intensive workload across a series of dedicated