Christopher Brooks scite author profile

Tags have recently become popular as a means of annotating and organizing Web pages and blog entries. Advocates of tagging argue that the use of tags produces a 'folksonomy', a system in which the meaning of a tag is determined by its use among the community as a whole. We analyze the effectiveness of tags for classifying blog entries by gathering the top 350 tags from Technorati and measuring the similarity of all articles that share a tag. We find that tags are useful for grouping articles into broad categories, but less effective in indicating the particular content of an article. We then show that automatically extracting words deemed to be highly relevant can produce a more focused categorization of articles. We also show that clustering algorithms can be used to reconstruct a topical hierarchy among tags, and suggest that these approaches may be used to address some of the weaknesses in current tagging systems.

show abstract

Determinate composition of FMUs for co-simulation

Broman

Brooks

Greenberg³

et al. 2013

102

159

View full text Add to dashboard Cite

Scaling up workflow-based applications

Callaghan

Deelman²,

Gunter

et al. 2010

Journal of Computer and System Sciences

View full text Add to dashboard Cite

Scientific applications, often expressed as workflows are making use of large-scale national cyberinfrastructure to explore the behavior of systems, search for phenomena in large-scale data, and to conduct many other scientific endeavors. As the complexity of the systems being studied grows and as the data set sizes increase, the scale of the computational workflows increases as well. In some cases, workflows now have hundreds of thousands of individual tasks. Managing such scale is difficult from the point of view of workflow description, execution, and analysis. In this paper, we describe the challenges faced by workflow management and performance analysis systems when dealing with an earthquake science application, CyberShake, executing on the TeraGrid. The scientific goal of the SCEC CyberShake project is to calculate probabilistic seismic hazard curves for sites in Southern California. For each site of interest, the CyberShake platform includes two large-scale MPI calculations and approximately 840,000 embarrassingly parallel postprocessing jobs. In this paper, we show how we approach the scalability challenges in our workflow management and log mining systems.

show abstract

Predictive Modelling in Teaching and Learning

Brooks¹,

Thompson²

2017

View full text Add to dashboard Cite

show abstract

Heterogeneous composition of models of computation

Goderis

Brooks

Altıntaş

et al. 2009

Future Generation Computer Systems

View full text Add to dashboard Cite

Composing Different Models of Computation in Kepler and Ptolemy II

Goderis

Brooks

Altıntaş

et al. 2007

View full text Add to dashboard Cite

Abstract.A model of computation (MoC) is a formal abstraction of execution in a computer. There is a need for composing MoCs in e-science. Kepler, which is based on Ptolemy II, is a scientific workflow environment that allows for MoC composition. This paper explains how MoCs are combined in Kepler and Ptolemy II and analyzes which combinations of MoCs are currently possible and useful. It demonstrates the approach by combining MoCs involving dataflow and finite state machines. The resulting classification should be relevant to other workflow environments wishing to combine multiple MoCs.Keywords: Model of computation, scientific workflow, Kepler, Ptolemy II. The Need for Composing Models of Computation in E-ScienceE-scientists design on-line (in silico) experiments by orchestrating components on the Web or Grid. On-line experiments are often orchestrated based on a scientific workflow environment. Scientific workflow environments typically offer support for the design, enactment and provenance recording of computational experiments.Most workflow environments fix the model of computation (MoC, or the formal abstraction of computational execution) available to an e-scientist. They leave little flexibility to change MoC as the experiment evolves. Different experiments are modeled more cleanly with different MoCs because of their relative expressiveness and efficiency. Different uses of MoCs for scientific workflows include dataflow for pipeline compositions, e.g. gene annotation pipelines; continuous-time ordinary differential equation solvers, e.g. for Lattice-Boltzmann simulations in fluid dynamics; and finite state machines for modelling sequential control logic, e.g. in clinical protocols or instrument interaction.

show abstract

Metrics for heterogeneous scientific workflows: A case study of an earthquake science application

Callaghan

Maechling

Small

et al. 2011

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

Scientific workflows are a common computational model for performing scientific simulations. They may include many jobs, many scientific codes, and many file dependencies. Since scientific workflow applications may include both high-performance computing (HPC) and high-throughput computing (HTC) jobs, meaningful performance metrics are difficult to define, as neither traditional HPC metrics nor HTC metrics fully capture the extent of the application. We describe and propose the use of alternative metrics to accurately capture the scale of scientific workflows and quantify their efficiency. In this paper, we present several specific practical scientific workflow performance metrics and discuss these metrics in the context of a large-scale scientific workflow application, the Southern California Earthquake Center CyberShake 1.0 Map calculation. Our metrics reflect both computational performance, such as floating-point operations and file access, and workflow performance, such as job and task scheduling and execution. We break down performance into three levels of granularity: the task, the workflow, and the application levels, presenting a complete view of application performance. We show how our proposed metrics can be used to compare multiple invocations of the same application, as well as executions of heterogeneous applications, quantifying the amount of work performed and the efficiency of the work. Finally, we analyze CyberShake using our proposed metrics to determine potential application optimizations.

show abstract

Automated strategy searches in an electronic goods market

Brooks

Fay

Das

et al. 1999

View full text Add to dashboard Cite

Markets for electronic goods provide the possibility of exploring new and more complex pricing schemes, due to the flexibility of information goods and negligible marginal cost. In this paper we compare dynamic performance across price schedules of varying complexity. We provide a monopolist producer with two machine learning methods which implement a strategy that balances exploitation to maximize current profits against exploration to improve future profits. We find that the complexity of the price schedule affects both the amount of exploration necessary and the aggregate profit received by a producer. In general, simpler price schedules are more robust and give up less profit during the learning periods even though the more complex schedules have higher long-run profits. These results hold for both learning methods, even though the relative performance of the methods is quite sensitive to differences in the smoothness of the profit landscape for different price schedules. Our results have implications for automated learning and strategic pricing in non-stationary environments, which arise when the consumer population changes, individuals change their preferences, or competing firms change their strategies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.