Abstract. Pollen-induced allergies are among the most prevalent non-contagious diseases, with about a quarter of the European population being sensitive to various atmospheric bioaerosols. In most European countries, pollen information is based on a weekly-cycle Hirst-type pollen trap method. This method is labour-intensive and requires narrow specialized abilities and substantial time, so that the pollen data are always delayed and subject to sampling- and counting-related uncertainties. Emerging new approaches to automatic pollen monitoring can, in principle, allow for real-time availability of the data with no human involvement. The goal of the current paper is to evaluate the capabilities of the new Plair Rapid-E pollen monitor and to construct a first-level pollen recognition algorithm. The evaluation was performed for three devices located in Lithuania, Serbia and Switzerland, with independent calibration data and classification algorithms. The Rapid-E output data include multi-angle scattering images and the fluorescence spectra recorded at several times for each particle reaching the device. Both modalities of the Rapid-E output were treated with artificial neural networks (ANNs) and the results were combined to obtain the pollen type. For the first classification experiment, the monitor was challenged with a large variety of pollen types and the quality of many-to-many classification was evaluated. It was shown that in this case, both scattering- and fluorescence-based recognition algorithms fall short of acceptable quality. The combinations of these algorithms performed better, exceeding 80 % accuracy for 5 out of 11 species. Fluorescence spectra showed similarities among different species, ending up with three well-resolved groups: (Alnus, Corylus, Betula and Quercus), (Salix and Populus) and (Festuca, Artemisia and Juniperus). Within these groups, pollen is practically indistinguishable for the first-level recognition procedure. Construction of multistep algorithms with sequential discrimination of pollen inside each group seems to be one of the possible ways forward. In order to connect the classification experiment to existing technology, a short comparison with the Hirst measurements is presented and the issue of false positive pollen detections by Rapid-E is discussed.
An increasing amount of geo-referenced mobile phone data enables the identification of behavioral patterns, habits and movements of people. With this data, we can extract the knowledge potentially useful for many applications including the one tackled in this study - understanding spatial variation of epidemics. We explored the datasets collected by a cell phone service provider and linked them to spatial HIV prevalence rates estimated from publicly available surveys. For that purpose, 224 features were extracted from mobility and connectivity traces and related to the level of HIV epidemic in 50 Ivory Coast departments. By means of regression models, we evaluated predictive ability of extracted features. Several models predicted HIV prevalence that are highly correlated (>0.7) with actual values. Through contribution analysis we identified key elements that correlate with the rate of infections and could serve as a proxy for epidemic monitoring. Our findings indicate that night connectivity and activity, spatial area covered by users and overall migrations are strongly linked to HIV. By visualizing the communication and mobility flows, we strived to explain the spatial structure of epidemics. We discovered that strong ties and hubs in communication and mobility align with HIV hot spots.
The aim of this paper is to describe a solution suitable for the automation of standard pollen information service (EN 16868:2019). We are describing the RealForAll integrated information system developed for automatic airborne pollen detection and real-time data delivery to end-users. This solution is based on the measurements from the Rapid-E airborne particle monitor. The system incorporates an AI-enabled subsystem based on a convolutional neural network that continuously retrieves raw data from Rapid-E and performs the classification of airborne pollen. The main advantages of this system reflect in real-time data delivery and independence of aerobiology experts during the pollen season.
One of the biggest problems in agriculture is concerned with seed selection. Wrong choice of seed variety cannot be compensated with fertilisation, spraying or the use of mechanisation later in the season. The purpose of this work was to design the strategy for selecting soybean varieties that should be planted on the test farm in order to maximise yield in the following season, based on the knowledge acquired from heterogeneous historical data. We propose weighted histograms regression to predict the yield of different varieties and compare our method to conventional regression algorithms. Based on the predicted yield, we perform portfolio optimisation to come up with the optimal selection of seed varieties that is to be planted. Presented algorithms and results were produced within the Syngenta Crop Challenge.
The aim of this work was to develop a method for selection of optimal soybean varieties for the American Midwest using data analytics. We extracted the knowledge about 174 varieties from the dataset, which contained information about weather, soil, yield and regional statistical parameters. Next, we predicted the yield of each variety in each of 6,490 observed subregions of the Midwest. Furthermore, yield was predicted for all the possible weather scenarios approximated by 15 historical weather instances contained in the dataset. Using predicted yields and covariance between varieties through different weather scenarios, we performed portfolio optimisation. In this way, for each subregion, we obtained a selection of varieties, that proved superior to others in terms of the amount and stability of yield. According to the rules of Syngenta Crop Challenge, for which this research was conducted, we aggregated the results across all subregions and selected up to five soybean varieties that should be distributed across the network of seed retailers. The work presented in this paper was the winning solution for Syngenta Crop Challenge 2017.
Abstract. Pollen-induced allergy is among the most-prevalent non-contagious diseases, with about a quarter of European population sensitive to various atmospheric bioaerosols. In most European countries, pollen information is based on a weekly-cycle Hirst-type pollen trap method. This method is labour-intensive, requires narrow specialization abilities and substantial time, so that the pollen data are always delayed, subject to sampling- and counting-related uncertainties. Emerging new approaches to automatic pollen monitoring can, in principle, allow for real-time availability of the data with no human involvement. The goal of the current paper is to evaluate the capabilities of the new Plair Rapid-E pollen monitor and to construct the first-level pollen recognition algorithm. The evaluation was performed for three devices located in Lithuania, Serbia and Switzerland, with independent calibration data and classification algorithms. The Rapid-E output data include multi-angle scattering images and the fluorescence spectra recorded at several times for each particle reaching the device. Both modalities of the Rapid-E output were treated with artificial neural networks (ANN) and the results were combined to obtain the pollen type. For the first classification experiment, the monitor was challenged with a large variety of pollen types and the quality of many-to-many classification was evaluated. It was shown that in this case, both scattering- and fluorescence- based recognition algorithms fall short of acceptable quality. The combinations of these algorithms performed better exceeding 80 % accuracy for 5 out of 11 species. Fluorescence spectra showed similarities among different species ending up with three well-resolved groups: (Alnus, Corylus, Betula and Quercus), (Salix and Populus), and (Festuca, Artemisia, Juniperus). Within these groups, pollen is practically non-distinguishable for the first-level recognition procedure. Construction of multi-steps algorithms with sequential discrimination of pollen inside each group seems to be one of possible ways forwards. In order to connect the classification experiment to existing technology, a short comparison with the Hirst measurements is presented and an issue of the false-positive pollen detections by Rapid-E is discussed.
CDR (Call Detail Record) data are one type of mobile phone data collected by operators each time a user initiates/receives a phone call or sends/receives an sms. CDR data are a rich geo-referenced source of user behaviour information. In this work, we perform an analysis of CDR data for the city of Milan that originate from Telecom Italia Big Data Challenge. A set of graphs is generated from aggregated CDR data, where each node represents a centroid of an RBS (Radio Base Station) polygon, and each edge represents aggregated telecom traffic between two RBSs. To explore the community structure, we apply a modularity-based algorithm. Community structure between days is highly dynamic, with variations in number, size and spatial distribution. One general rule observed is that communities formed over the urban core of the city are small in size and prone to dynamic change in spatial distribution, while communities formed in the suburban areas are larger in size and more consistent with respect to their spatial distribution. To evaluate the dynamics of change in community structure between days, we introduced different graph based and spatial community properties which contain latent footprint of human dynamics. We created land use profiles for each RBS polygon based on the Copernicus Land Monitoring Service Urban Atlas data set to quantify the correlation and predictivennes of human dynamics properties based on land use. The results reveal a strong correlation between some properties and land use which motivated us to further explore this topic. The proposed methodology has been implemented in the programming language Scala inside the Apache Spark engine to support the most computationally intensive tasks and in Python using the rich portfolio of data analytics and machine learning libraries for the less demanding tasks.
When coupled with spatio-temporal context, location-based data collected in mobile cellular networks provide insights into patterns of human activity, interactions, and mobility. Whilst uncovered patterns have immense potential for improving services of telecom providers as well as for external applications related to social wellbeing, its inherent massive volume make such 'Big Data' sets complex to process. A significant number of studies involving such mobile phone data have been presented, but there still remain numerous open challenges to reach technology readiness. They include efficient access in privacy-preserving manner, high performance computing environments, scalable data analytics, innovative data fusion with other sources-all finally linked into the applications ready for operational mode. In this chapter, we provide a broad overview of the entire workflow from raw data access to the final applications and point out the critical challenges in each step that need to be addressed to unlock the value of data generated by mobile cellular networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.