With the increasing adoption of Cloud Computing, we observe an increasing need for Cloud Benchmarks, in order to assess the performance of Cloud infrastructures and software stacks, to assist with provisioning decisions for Cloud users, and to compare Cloud offerings. We understand our paper as one of the first systematic approaches to the topic of Cloud Benchmarks. Our driving principle is that Cloud Benchmarks must consider end-to-end performance and pricing, taking into account that services are delivered over the Internet. This requirement yields new challenges for benchmarking and requires us to revisit existing benchmarking practices in order to adopt them to the Cloud.
The efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with Alternating Least Squares which uses a series of broadcast-joins that can be efficiently executed with MapReduce.We empirically show that the performance of our solution is suitable for real-world use cases. We present experiments on two publicly available datasets and on a synthetic dataset termed Bigflix, generated from the Netflix dataset. Bigflix contains 25 million users and more than 5 billion ratings, mimicking data sizes recently reported as Netflix' production workload. We demonstrate that our approach is able to run an iteration of Alternating Least Squares in six minutes on this dataset. Our implementation has been contributed to the open source machine learning library Apache Mahout.
International audienceThe European Black Pine (Pinus nigra Arn.) has a long and complex history. Genetic distance and frequency analyses identified three differentiated genetic groups, which corresponded to three wide geographical areas: Westerns Mediterranean, Balkan Peninsula and Asia Minor. These groups shared common ancestors (14.75 and 10.72 Ma). The most recent splits occurred after the Messinian Salinity Crisis (4.37 Ma) and the Early–Middle Pleistocene Transitions (0.93 Ma). The posterior ancestral population size (Na) is 260, 000–265,000 individuals. Each pool is further fragmented, with evidence of a phylogeographic structure (Nst > Gst) typ- ically observed in some natural populations from the Western Mediterranean region and the Balkan Peninsula. The labora- tory analysis was performed by fragment analysis—i.e. elec- trophoretic sizing of polymerase chain reaction fragments, combined with the sequencing analysis of 33 % of all individ- uals as a control. Intense sampling of chloroplast DNA poly- morphisms (3154 individuals and 13 markers: SNPs and SSRs) over the full area of the species’ natural distribution indicated moderate among-population variability (Gst(nc) ≤ 0.177) in various parts of its range. These results indicate that the natural populations have long migration his- tories that differ from one another and that they have been strongly phylogeographically affected by complex patterns of isolation, speciation and fragmentation. Long and varying climatic fluctuations in the region of the principal genetic group have been the probable cause of different forest com- munity associations with different successional patterns resulting in interglacial refugia vs. macro long-term refugia
Using nuclear simple sequence repeats (nuSSRs), we determined the genetic variability in the natural distribution range of maritime pine (Pinus pinaster) in the western Mediterranean region. We analysed the role of global and significant climatic fluctuations in driving the evolutionary diversification of this species. We attempted to determine the impact of the last glacial maximum (LGM) and human activity on genetic variation and to identify the effect of bottlenecks, admixing, migration, time to the most recent common ancestor (TMRCA), and recent splits. A total of 972 individuals were analysed. The sample represented 27 natural populations from the western Mediterranean region, which encompasses most of the natural range of P. pinaster. Using eight nuSSRs, we analysed genetic diversity indices for each population and group of populations. We also examined the interpopulation structure by the frequency and distance method and investigated genetic barriers, signals of historical demographic fluctuations, phylogeographic structure, admixing, rate of mutation, migration, as well as testing the hypothesis of isolation by distance (IBD). Both cluster analyses showed similar population genetic structure with three genetic barriers that divided the samples into four large groups. Intensive migration was only detected during the period of the last glacial maximum (LGM), which permitted the mutation rate of the markers used to be calculated. The majority of the population was found to exhibit signs of a recent bottleneck and its timing showed a clear northeast-southwest geographic distribution. A clearly defined phylogeographic structure (Nst > Gst and Rst > Gst ) under IBD was established, and showed the highest divergence between groups of populations separated by physical barriers, such as the Strait of Gibraltar, the Mediterranean Sea and the Pyrenees. The high level of intergroup genetic differentiation (ΦIS = 20.26) was attributed to a long historical isolation (which occurred before the last 18 000 years) between the principal maritime pine population groups that occurred due to physical barriers that limited pollen and seed transfer, combined with a minimal effective radius of distribution. The low level of genetic diversity among the populations was combined with genetic drift and a recent bottleneck during the period of human activity. Significant migration across barriers was due to spontaneous phenomena during the LGM, which had no significant impact on the genetic structure owing to its relatively short duration and the fragmented species. The phylogeographic structure under the assumption of IBD was well established for P. pinaster in each of the principal population groups.
Large-scale data analysis applications require processing and analyzing of Terabytes or even Petabytes of data, particularly in the areas of web analysis or scientific data management. This trend has been discussed as "web-scale data management" in a panel at VLDB 2009. Formerly, parallel data processing was the domain of parallel database systems. Today, novel requirements like scaling out to thousands of machines, improved fault-tolerance, and schema free processing have made a case for new approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.