Multitemporal crop classification approaches have demonstrated high performance within a given season. However, cross-season and cross-region crop classification presents a unique transferability challenge. This study addresses this challenge by adopting a domain generalization approach, e.g., by training models on multiple seasons to improve generalization to new, unseen target years. We utilize a comprehensive five-year Sentinel-2 dataset over different agricultural regions in Slovakia and a diverse crop scheme (eight crop classes). We evaluate the performance of different machine learning classification algorithms, including random forests, support vector machines, quadratic discriminant analysis, and neural networks. Our main findings reveal that the transferability of models across years differs between regions, with the Danubian lowlands demonstrating better performance (overall accuracies ranging from 91.5% in 2022 to 94.3% in 2020) compared to eastern Slovakia (overall accuracies ranging from 85% in 2022 to 91.9% in 2020). Quadratic discriminant analysis, support vector machines, and neural networks consistently demonstrated high performance across diverse transferability scenarios. The random forest algorithm was less reliable in generalizing across different scenarios, particularly when there was a significant deviation in the distribution of unseen domains. This finding underscores the importance of employing a multi-classifier analysis. Rapeseed, grasslands, and sugar beet consistently show stable transferability across seasons. We observe that all periods play a crucial role in the classification process, with July being the most important and August the least important. Acceptable performance can be achieved as early as June, with only slight improvements towards the end of the season. Finally, employing a multi-classifier approach allows for parcel-level confidence determination, enhancing the reliability of crop distribution maps by assuming higher confidence when multiple classifiers yield similar results. To enhance spatiotemporal generalization, our study proposes a two-step approach: (1) determine the optimal spatial domain to accurately represent crop type distribution; and (2) apply interannual training to capture variability across years. This approach helps account for various factors, such as different crop rotation practices, diverse observational quality, and local climate-driven patterns, leading to more accurate and reliable crop classification models for nationwide agricultural monitoring.
This article helps establish reliable baselines for document-level sentiment analysis in highly inflected languages like Czech and Slovak. We revisit an earlier study representing the first comprehensive formulation of such baselines in Czech and show that some of its reported results need to be significantly revised. More specifically, we show that its online product review dataset contained more than 18% of non-trivial duplicates, which incorrectly inflated its macro F1-measure results by more than 19 percentage points. We also establish that part-of-speech-related features have no damaging effect on machine learning algorithms (contrary to the claim made in the study) and rehabilitate the Chi-squared metric for feature selection as being on par with the best performing metrics such as Information Gain. We demonstrate that in feature selection experiments with Information Gain and Chi-squared metrics, the top 10% of ranked unigram and bigram features suffice for the best results regarding online product and movie reviews, while the top 5% of ranked unigram and bigram features are optimal for the Facebook dataset. Finally, we reiterate an important but often ignored warning by George Forman and Martin Scholz that different possible ways of averaging the F1-measure in cross-validation studies of highly unbalanced datasets can lead to results differing by more than 10 percentage points. This can invalidate the comparisons of F1-measure results across different studies if incompatible ways of averaging F1 are used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.