Efficient seed germination and establishment are important traits for field and glasshouse crops. Large-scale germination experiments are laborious and prone to observer errors, leading to the necessity for automated methods. We experimented with five crop species, including tomato, pepper, Brassica, barley, and maize, and concluded an approach for large-scale germination scoring. Here, we present the SeedGerm system, which combines cost-effective hardware and open-source software for seed germination experiments, automated seed imaging, and machine-learning based phenotypic analysis. The software can process multiple image series simultaneously and produce reliable analysis of germination-and establishment-related traits, in both comma-separated values (CSV) and processed images (PNG) formats. In this article, we describe the hardware and software design in detail. We also demonstrate that SeedGerm could match specialists' scoring of radicle emergence. Germination curves were produced based on seed-level germination timing and rates rather than a fitted curve. In particular, by scoring germination across a diverse panel of Brassica napus varieties, SeedGerm implicates a gene important in abscisic acid (ABA) signalling in seeds. We compared SeedGerm with existing methods and concluded that it could have wide utilities in large-scale seed phenotyping and testing, for both research and routine seed technology applications.
Automated phenotyping technologies are capable of providing continuous and precise measurements of traits that are key to today’s crop research, breeding and agronomic practices. In additional to monitoring developmental changes, high-frequency and high-precision phenotypic analysis can enable both accurate delineation of the genotype-to-phenotype pathway and the identification of genetic variation influencing environmental adaptation and yield potential. Here, we present an automated and scalable field phenotyping platform called CropQuant, designed for easy and cost-effective deployment in different environments. To manage infield experiments and crop-climate data collection, we have also developed a web-based control system called CropMonitor to provide a unified graphical user interface (GUI) to enable realtime interactions between users and their experiments. Furthermore, we established a high-throughput trait analysis pipeline for phenotypic analyses so that lightweight machine-learning modelling can be executed on CropQuant workstations to study the dynamic interactions between genotypes (G), phenotypes (P), and environmental factors (E). We have used these technologies since 2015 and reported results generated in 2015 and 2016 field experiments, including developmental profiles of five wheat genotypes, performance-related traits analyses, and new biological insights emerged from the application of the CropQuant platform.
Machine learning has previously been applied successfully to speech-driven facial animation. To account for carry-over and anticipatory coarticulation a common approach is to predict the facial pose using a symmetric window of acoustic speech that includes both past and future context. Using future context limits this approach for animating the faces of characters in real-time and networked applications, such as online gaming. An acceptable latency for conversational speech is 200ms and typically network transmission times will consume a significant part of this. Consequently, we consider asymmetric windows by investigating the extent to which decreasing the future context effects the quality of predicted animation using both deep neural networks (DNNs) and bi-directional LSTM recurrent neural networks (BiLSTMs). Specifically we investigate future contexts from 170ms (fully-symmetric) to 0ms (fullyasymmetric). We find that a BiLSTM trained using 70ms of future context is able to predict facial motion of equivalent quality as a DNN trained with 170ms, while introducing increased processing time of only 5ms. Subjective tests using the BiLSTM show that reducing the future context from 170ms to 50ms does not significantly decrease perceived realism. Below 50ms, the perceived realism begins to deteriorate, generating a trade-off between realism and latency.
This work proposes and compares perceptually motivated loss functions for deep learning based binary mask estimation for speech separation. Previous loss functions have focused on maximising classification accuracy of mask estimation but we now propose loss functions that aim to maximise the hit minus false-alarm (HIT-FA) rate which is known to correlate more closely to speech intelligibility. The baseline loss function is binary cross-entropy (CE), a standard loss function used in binary mask estimation, which maximises classification accuracy. We propose first a loss function that maximises the HIT-FA rate instead of classification accuracy. We then propose a second loss function that is a hybrid between CE and HIT-FA, providing a balance between classification accuracy and HIT-FA rate. Evaluations of the perceptually motivated loss functions with the GRID database show improvements to HIT-FA rate and ESTOI across babble and factory noises. Further tests then explore application of the perceptually motivated loss functions to a larger vocabulary dataset.
This work is concerned with using deep neural networks for estimating binary masks within a speech enhancement framework. We first examine the effect of supplementing the audio features used in mask estimation with visual speech information. Visual speech is known to be robust to noise although not necessarily as discriminative as audio features, particularly at higher signal-to-noise ratios. Furthermore, most DNN approaches to mask estimate use the cross-entropy (CE) loss function which aims to maximise classification accuracy. However, we first propose a loss function that aims to maximise the hit minus false-alarm (HIT-FA) rate of the mask, which is known to correlate more closely to speech intelligibility than classification accuracy. We then extend this to a hybrid loss function that combines both the CE and HIT-FA loss functions to provide a balance between classification accuracy and HIT-FA rate of the resulting masks. Evaluations of the perceptually motivated loss functions are carried out using the GRID and larger RM-3000 datasets and show improvements to HIT-FA rate and ESTOI across all noises and SNRs tested. Tests also found that supplementing audio with visual information into a single bimodal audiovisual system gave best performance for all measures and conditions tested.
A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective intelligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modifications are made by resynthesising speech that has been spectrally smoothed. Objective measures are applied to the modified speech and include measures of speech quality, signalto-noise ratio and intelligibility, as well as proposing the normalised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≥ 0.7), with NFD achieving the highest correlation (r = −0.81).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.