This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence Newcastle University ePrints -eprint.ncl.ac.uk Leroux C, Jones H, Taylor J, Clenet A, Tisseyre B. A zone-based approach for processing and interpreting variability in multi-temporal yield data sets.
Suspicious observations, or the so-called outliers, are always present, to a greater or lesser extent, in agronomical and environmental datasets. Within field yield datasets are no exception. While most filtering approaches use expert thresholds and dedicated filters to remove these defective observations, more general and unsupervised methods will be required to process a growing number of yield maps. However, by using these last approaches, outliers would be solely identified and would remain unlabeled. This study proposes a methodology to provide a label to these defective observations so that users can better characterize the harvest process, e.g. functioning of the machine, driving of the operator, and provide guidelines for future improvements of equipment and operations processes. Here, it is assumed that outliers have already been detected by a non-parametric and unsupervised published approach. Clusters of outliers are first identified in the data to gather outliers with similar yield outlying characteristics. Once detected, these clusters are given a first-order label which describes the general yield outlying characteristics of the observations that belong to these clusters. Then, within each cluster, each outlier is given a second-order label to provide more information on the origin of the defective observation. Yield simulated datasets with known characteristics and labelled outliers were used to test the methodology. The proposed approach was then applied on real yield datasets with unlabeled outliers. This study shows that it might be conceivable to label outliers detected with an unsupervised approach but that some labels are more accurate than others, especially those related to an unknown cutting width of the harvester or to narrow finishes within the fields. Outlying observations behaved similarly between simulated and real datasets which made it possible to infer more precisely the label of defective observations. By labelling outlying observations, it was possible to provide an appropriate correction to one of the real yield dataset and to restore almost 15% of the outlying observations instead of removing them. This study is a first attempt to provide a label to yield outliers detected from an unsupervised manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.