To meet societal needs, modern estuarine science needs to be interdisciplinary and collaborative, combine discovery with hypotheses testing, and be responsive to issues facing both regional and global stakeholders. Such an approach is best conducted with the benefit of data-rich environments, where information from sensors and models is openly accessible within convenient timeframes. Here, we introduce the operational infrastructure of one such data-rich environment, a collaboratory created to support (a) interdisciplinary research in the Columbia River estuary by the multi-institutional team of investigators of the Science and Technology Center for Coastal Margin Observation & Prediction and (b) the integration of scientific knowledge into regional decision making. Core components of the operational infrastructure are an observation network, a modeling system and a cyber-infrastructure, each of which is described. The observation network is anchored on an extensive array of long-term stations, many of them interdisciplinary, and is complemented by on-demand deployment of temporary stations and mobile platforms, often in coordinated field campaigns. The modeling system is based on finiteelement unstructured-grid codes and includes operational and process-oriented simulations of circulation, sediments and ecosystem processes. The flow of information is managed through a dedicated cyber-infrastructure, conversant with regional and national observing systems.
The past decade has seen an explosion in the number and types of environmental sensors deployed, many of which provide a continuous stream of observations. Each individual observation consists of one or more sensor measurements, a geographic location, and a time. With billions of historical observations stored in diverse databases and in thousands of datasets, scientists have difficulty finding relevant observations. We present an approach that creates consistent geospatial-temporal metadata from large repositories of diverse data by blending curated and automated extracts. We describe a novel query method over this metadata that returns ranked search results to a query with geospatial and temporal search criteria. Lastly, we present a prototype that demonstrates the utility of these ideas in the context of an ocean and coastalmargin observatory.
The past decade has seen a dramatic increase in the amount of data captured and made available to scientists for research. This increase amplifies the difficulty scientists face in finding the data most relevant to their information needs. In prior work, we hypothesized that Information Retrieval-style ranked search can be applied to data sets to help a scientist discover the most relevant data amongst the thousands of data sets in many formats, much like text-based ranked search helps users make sense of the vast number of Internet documents. To test this hypothesis, we explored the use of ranked search for scientific data using an existing multi-terabyte observational archive as our test-bed. In this paper, we investigate whether the concept of varying relevance, and therefore ranked search, applies to numeric data-that is, are data sets are enough like documents for Information Retrieval techniques and evaluation measures to apply? We present a user study that demonstrates that data set similarity resonates with users as a basis for relevance and, therefore, for ranked search. We evaluate a prototype implementation of ranked search over data sets with a second user study and demonstrate that ranked search improves a scientist's ability to find needed data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.