Tags have recently become popular as a means of annotating and organizing Web pages and blog entries. Advocates of tagging argue that the use of tags produces a 'folksonomy', a system in which the meaning of a tag is determined by its use among the community as a whole. We analyze the effectiveness of tags for classifying blog entries by gathering the top 350 tags from Technorati and measuring the similarity of all articles that share a tag. We find that tags are useful for grouping articles into broad categories, but less effective in indicating the particular content of an article. We then show that automatically extracting words deemed to be highly relevant can produce a more focused categorization of articles. We also show that clustering algorithms can be used to reconstruct a topical hierarchy among tags, and suggest that these approaches may be used to address some of the weaknesses in current tagging systems.
Scientific applications, often expressed as workflows are making use of large-scale national cyberinfrastructure to explore the behavior of systems, search for phenomena in large-scale data, and to conduct many other scientific endeavors. As the complexity of the systems being studied grows and as the data set sizes increase, the scale of the computational workflows increases as well. In some cases, workflows now have hundreds of thousands of individual tasks. Managing such scale is difficult from the point of view of workflow description, execution, and analysis. In this paper, we describe the challenges faced by workflow management and performance analysis systems when dealing with an earthquake science application, CyberShake, executing on the TeraGrid. The scientific goal of the SCEC CyberShake project is to calculate probabilistic seismic hazard curves for sites in Southern California. For each site of interest, the CyberShake platform includes two large-scale MPI calculations and approximately 840,000 embarrassingly parallel postprocessing jobs. In this paper, we show how we approach the scalability challenges in our workflow management and log mining systems.
This article describes the process, practice, and challenges of using predictive modelling analytics (LA) predictive modelling has become a core practice of researchers, largely with chapter, we provide a general overview of considerations when using predictive modelling, the steps that an educational data scientist must consider when engaging in the process,
Abstract.A model of computation (MoC) is a formal abstraction of execution in a computer. There is a need for composing MoCs in e-science. Kepler, which is based on Ptolemy II, is a scientific workflow environment that allows for MoC composition. This paper explains how MoCs are combined in Kepler and Ptolemy II and analyzes which combinations of MoCs are currently possible and useful. It demonstrates the approach by combining MoCs involving dataflow and finite state machines. The resulting classification should be relevant to other workflow environments wishing to combine multiple MoCs.Keywords: Model of computation, scientific workflow, Kepler, Ptolemy II. The Need for Composing Models of Computation in E-ScienceE-scientists design on-line (in silico) experiments by orchestrating components on the Web or Grid. On-line experiments are often orchestrated based on a scientific workflow environment. Scientific workflow environments typically offer support for the design, enactment and provenance recording of computational experiments.Most workflow environments fix the model of computation (MoC, or the formal abstraction of computational execution) available to an e-scientist. They leave little flexibility to change MoC as the experiment evolves. Different experiments are modeled more cleanly with different MoCs because of their relative expressiveness and efficiency. Different uses of MoCs for scientific workflows include dataflow for pipeline compositions, e.g. gene annotation pipelines; continuous-time ordinary differential equation solvers, e.g. for Lattice-Boltzmann simulations in fluid dynamics; and finite state machines for modelling sequential control logic, e.g. in clinical protocols or instrument interaction.
Scientific workflows are a common computational model for performing scientific simulations. They may include many jobs, many scientific codes, and many file dependencies. Since scientific workflow applications may include both high-performance computing (HPC) and high-throughput computing (HTC) jobs, meaningful performance metrics are difficult to define, as neither traditional HPC metrics nor HTC metrics fully capture the extent of the application. We describe and propose the use of alternative metrics to accurately capture the scale of scientific workflows and quantify their efficiency. In this paper, we present several specific practical scientific workflow performance metrics and discuss these metrics in the context of a large-scale scientific workflow application, the Southern California Earthquake Center CyberShake 1.0 Map calculation. Our metrics reflect both computational performance, such as floating-point operations and file access, and workflow performance, such as job and task scheduling and execution. We break down performance into three levels of granularity: the task, the workflow, and the application levels, presenting a complete view of application performance. We show how our proposed metrics can be used to compare multiple invocations of the same application, as well as executions of heterogeneous applications, quantifying the amount of work performed and the efficiency of the work. Finally, we analyze CyberShake using our proposed metrics to determine potential application optimizations.
Markets for electronic goods provide the possibility of exploring new and more complex pricing schemes, due to the flexibility of information goods and negligible marginal cost. In this paper we compare dynamic performance across price schedules of varying complexity. We provide a monopolist producer with two machine learning methods which implement a strategy that balances exploitation to maximize current profits against exploration to improve future profits. We find that the complexity of the price schedule affects both the amount of exploration necessary and the aggregate profit received by a producer. In general, simpler price schedules are more robust and give up less profit during the learning periods even though the more complex schedules have higher long-run profits. These results hold for both learning methods, even though the relative performance of the methods is quite sensitive to differences in the smoothness of the profit landscape for different price schedules. Our results have implications for automated learning and strategic pricing in non-stationary environments, which arise when the consumer population changes, individuals change their preferences, or competing firms change their strategies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.