This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents. In the past it has been common to compute scores for the individual fields (e.g. title and body) independently and then combine these scores (typically linearly) to arrive at a final score for the document. We highlight how this approach can lead to poor performance by breaking the carefully constructed non-linear saturation of term frequency in the BM25 function. We propose a much more intuitive alternative which weights term frequencies before the nonlinear term frequency saturation function is applied. In this scheme, a structured document with a title weight of two is mapped to an unstructured document with the title content repeated twice. This more verbose unstructured document is then ranked in the usual way. We demonstrate the advantages of this method with experiments on Reuters Vol1 and the TREC dotGov collection.
A climate data record of global sea surface temperature (SST) spanning 1981–2016 has been developed from 4 × 1012 satellite measurements of thermal infra-red radiance. The spatial area represented by pixel SST estimates is between 1 km2 and 45 km2. The mean density of good-quality observations is 13 km−2 yr−1. SST uncertainty is evaluated per datum, the median uncertainty for pixel SSTs being 0.18 K. Multi-annual observational stability relative to drifting buoy measurements is within 0.003 K yr−1 of zero with high confidence, despite maximal independence from in situ SSTs over the latter two decades of the record. Data are provided at native resolution, gridded at 0.05° latitude-longitude resolution (individual sensors), and aggregated and gap-filled on a daily 0.05° grid. Skin SSTs, depth-adjusted SSTs de-aliased with respect to the diurnal cycle, and SST anomalies are provided. Target applications of the dataset include: climate and ocean model evaluation; quantification of marine change and variability (including marine heatwaves); climate and ocean-atmosphere processes; and specific applications in ocean ecology, oceanography and geophysics.
Abstract. We demonstrate improvements in CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations) dust extinction retrievals over northern Africa and Europe when corrections are applied regarding the Saharan dust lidar ratio assumption, the separation of the dust portion in detected dust mixtures, and the averaging scheme introduced in the Level 3 CALIPSO product. First, a universal, spatially constant lidar ratio of 58 sr instead of 40 sr is applied to individual Level 2 dust-related backscatter products. The resulting aerosol optical depths show an improvement compared with synchronous and collocated AERONET (Aerosol Robotic Network) measurements. An absolute bias of the order of −0.03 has been found, improving on the statistically significant biases of the order of −0.10 reported in the literature for the original CALIPSO product. When compared with the MODIS (Moderate-Resolution Imaging Spectroradiometer) collocated aerosol optical depth (AOD) product, the CALIPSO negative bias is even less for the lidar ratio of 58 sr. After introducing the new lidar ratio for the domain studied, we examine potential improvements to the climatological CALIPSO Level 3 extinction product: (1) by introducing a new methodology for the calculation of pure dust extinction from dust mixtures and (2) by applying an averaging scheme that includes zero extinction values for the nondust aerosol types detected. The scheme is applied at a horizontal spatial resolution of 1 • × 1 • for ease of comparison with the instantaneous and collocated dust extinction profiles simulated by the BSC-DREAM8b dust model. Comparisons show that the extinction profiles retrieved with the proposed methodology reproduce the well-known model biases per subregion examined. The very good agreement of the proposed CALIPSO extinction product with respect to AERONET, MODIS and the BSC-DREAM8b dust model makes this dataset an ideal candidate for the provision of an accurate and robust multiyear dust climatology over northern Africa and Europe.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.