Abstract:The refractive index (RI) is an important parameter in describing the radiative impacts of aerosols. It is important to constrain the RI of aerosol components, since there is still significant uncertainty regarding the RI of biomass burning aerosols. Experimentally measured extinction cross-sections, scattering cross-sections, and single scattering albedos for white pine biomass burning (BB) aerosols under two different burning and sampling conditions were modeled using T-matrix theory. The refractive indices were extracted from these calculations. Experimental measurements were conducted using a cavity ring-down spectrometer to measure the extinction, and a nephelometer to measure the scattering of size-selected aerosols. BB aerosols were obtained by burning white pine using (1) an open fire in a burn drum, where the aerosols were collected in distilled water using an impinger, and then re-aerosolized after several days, and (2) a tube furnace to directly introduce the BB aerosols into an indoor smog chamber, where BB aerosols were then sampled directly. In both cases, filter samples were also collected, and electron microscopy images were used to obtain the morphology and size information used in the T-matrix calculations. The effective radius of the particles collected on filter media from the open fire was approximately 245 nm, whereas it was approximately 76 nm for particles from the tube furnace burns. For samples collected in distilled water, the real part of the RI increased with increasing particle size, and the imaginary part decreased. The imaginary part of the RI was also significantly larger than the reported values for fresh BB aerosol samples. For the particles generated in the tube furnace, the real part of the RI decreased with particle size, and the imaginary part was much smaller and nearly constant. The RI is sensitive to particle size and sampling method, but there was no wavelength dependence over the range considered (500-680 nm). Our values for the RI of fresh (white pine) biomass burning aerosols ranged from 1.33 + i0.008 to 1.74 + i0.008 for 200-nm, 300-nm, and 400-nm diameter particles. These are within the range of RI values in the most recent study conducted during the Fire Laboratory at Missoula Experiments (FLAME I and II), which were 1.55 to 1.80 for the real part, and 0.01-0.50 for the imaginary part, for fresh BB aerosols with diameters of 200-570 nm. There is no clear trend on the dependence of the RI values on particle size. The RI values derived from measurements of aerosols produced from the combustion of hydrocarbons and diesel cannot be used for BB aerosols.
Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering (CF) algorithms. The effect of this subsampling on the computing time and accuracy of CF is not fully understood, and clear guidelines for selecting optimal or even appropriate subsampling levels are not available. In this paper, we present a Density-based Random Stratified Subsampling using Clustering (DRSC) algorithm in which the desired Fraction of Users Dropped (FUD) and Fraction of Items Dropped (FID) are specified, and the overall density during subsampling is maintained. Subsequently, we develop simple models of the Training Time Improvement (TTI) and the Accuracy Loss (AL) as functions of FUD and FID, based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens, Yahoo Music Rating, and Amazon Automotive data. Simulations show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven methods. The TTI linear regression of a CF method appears to be same for all datasets. Extensive simulations illustrate that TTI can be estimated reliably with FUD and FID only, but AL requires considering additional dataset characteristics. The derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL. A simple sub-optimal approximation was found, in which the optimal AL is proportional to the optimal Training Time Reduction Factor (TTRF) for higher values of TTRF, and the optimal subsampling levels, like optimal FID/(1-FID), are proportional to the square root of TTRF.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).
We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.