Abstract:As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll f… Show more
“…Moving computations to the edge with the Internet of Things (IoT), Machine Learning (ML), and generative AI for remote sensing using platforms such as sUAS, and integrated sensor networks streaming real and near real-time data are all areas where CyVerse is already involved. Applying CyVerse’s cyberinfrastructure capabilities to the most pressing challenges our society faces include, but are not limited to: adapting to and developing better strategies for resilience to climate change, exploring Genotype by Environment = Phenotype (G×E = P) in both agricultural and natural settings [ 89 , 90 ], using ML and AI for monitoring Earth system processes and studying human health, and developing precision medicine and synthetic biological approaches to life science (See S1 Text for explicit examples).…”
Section: Availability and Future Directionsmentioning
CyVerse, the largest publicly-funded open-source research cyberinfrastructure for life sciences, has played a crucial role in advancing data-driven research since the 2010s. As the technology landscape evolved with the emergence of cloud computing platforms, machine learning and artificial intelligence (AI) applications, CyVerse has enabled access by providing interfaces, Software as a Service (SaaS), and cloud-native Infrastructure as Code (IaC) to leverage new technologies. CyVerse services enable researchers to integrate institutional and private computational resources, custom software, perform analyses, and publish data in accordance with open science principles. Over the past 13 years, CyVerse has registered more than 124,000 verified accounts from 160 countries and was used for over 1,600 peer-reviewed publications. Since 2011, 45,000 students and researchers have been trained to use CyVerse. The platform has been replicated and deployed in three countries outside the US, with additional private deployments on commercial clouds for US government agencies and multinational corporations. In this manuscript, we present a strategic blueprint for creating and managing SaaS cyberinfrastructure and IaC as free and open-source software.
“…Moving computations to the edge with the Internet of Things (IoT), Machine Learning (ML), and generative AI for remote sensing using platforms such as sUAS, and integrated sensor networks streaming real and near real-time data are all areas where CyVerse is already involved. Applying CyVerse’s cyberinfrastructure capabilities to the most pressing challenges our society faces include, but are not limited to: adapting to and developing better strategies for resilience to climate change, exploring Genotype by Environment = Phenotype (G×E = P) in both agricultural and natural settings [ 89 , 90 ], using ML and AI for monitoring Earth system processes and studying human health, and developing precision medicine and synthetic biological approaches to life science (See S1 Text for explicit examples).…”
Section: Availability and Future Directionsmentioning
CyVerse, the largest publicly-funded open-source research cyberinfrastructure for life sciences, has played a crucial role in advancing data-driven research since the 2010s. As the technology landscape evolved with the emergence of cloud computing platforms, machine learning and artificial intelligence (AI) applications, CyVerse has enabled access by providing interfaces, Software as a Service (SaaS), and cloud-native Infrastructure as Code (IaC) to leverage new technologies. CyVerse services enable researchers to integrate institutional and private computational resources, custom software, perform analyses, and publish data in accordance with open science principles. Over the past 13 years, CyVerse has registered more than 124,000 verified accounts from 160 countries and was used for over 1,600 peer-reviewed publications. Since 2011, 45,000 students and researchers have been trained to use CyVerse. The platform has been replicated and deployed in three countries outside the US, with additional private deployments on commercial clouds for US government agencies and multinational corporations. In this manuscript, we present a strategic blueprint for creating and managing SaaS cyberinfrastructure and IaC as free and open-source software.
“…However, simply providing access to the code and models is not enough. It is equally important to provide integration into web applications and phenotyping workflow managers, such as PhytoOracle, to enable the computationally-efficient deployment of these models to large image datasets (Gonzalez et al 2023). Increasing the accessibility and integration of training models, images, and results is of paramount importance; it empowers a wider range of users to leverage these models, fostering innovation and progress.…”
Section: The Importance Of User-friendly Phenotyping Toolsmentioning
Charcoal rot of sorghum (CRS) is a significant disease affecting sorghum crops, with limited genetic resistance available. The causative agent, Macrophomina phaseolina (Tassi) Goid, is a highly destructive fungal pathogen that targets over 500 plant species globally, including essential staple crops. Utilizing field image data for precise detection and quantification of CRS could greatly assist in the prompt identification and management of affected fields and thereby reduce yield losses. The objective of this work was to implement various machine learning algorithms to evaluate their ability to accurately detect and quantify CRS in red-green-blue (RGB) images of sorghum plants exhibiting symptoms of infection. EfficientNet-B3 and a fully convolutional network (FCN) emerged as the top-performing models for image classification and segmentation tasks, respectively. Among the classification models evaluated, EfficientNet-B3 demonstrated superior performance, achieving an accuracy of 86.97%, a recall rate of 0.71, and an F1 score of 0.73. Of the segmentation models tested, FCN proved to be the most effective, exhibiting a validation accuracy of 97.76%, a recall rate of 0.68, and an F1 score of 0.66. As the size of the image patches increased, both models' validation scores increased linearly, and their processing time decreased exponentially. The models, in addition to being immediately useful for breeders and growers of sorghum, advance the domain of automated plant phenotyping and may serve as a base for drone-based or other automated field phenotyping efforts. Additionally, the models presented herein can be accessed through a web-based application where users can easily analyze their own images.
“…Enterprise Breeding System is an open-source software for breeding programs that enables management of germplasm trials and nurseries as well as data management and analysis ( CGIAR Excellence in Breeding Platform, 2022 ). More recently, PhytoOracle was released to provide a suite of tools that integrates open-source distributed computing frameworks for processing lettuce and sorghum phenotypic traits from RGB, thermal, PSII chlorophyll fluorescence, and 3D laser scanner datasets ( Gonzalez et al., 2023 ). For a comprehensive recent review of digital tools developed for field-based plant data collection and management, we refer the reader to Dipta et al.…”
Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.