This paper investigates suitability of supervised machine learning classification
methods for classification of biomes using pollen datasets. We assign modern
pollen samples from Africa and Arabia to five biome classes using a previously
published African pollen dataset and a global ecosystem classification scheme.
To test the applicability of traditional and machine-learning based
classification models for the task of biome prediction from high dimensional
modern pollen data, we train a total of eight classification models, including
Linear Discriminant Analysis, Logistic Regression, Naïve Bayes, K-Nearest
Neighbors, Classification Decision Tree, Random Forest, Neural Network, and
Support Vector Machine. The ability of each model to predict biomes from pollen
data is statistically tested on an independent test set. The Random Forest
classifier outperforms other models in its ability correctly classify biomes
given pollen data. Out of the eight models, the Random Forest classifier scores
highest on all of the metrics used for model evaluations and is able to predict
four out of five biome classes to high degree of accuracy, including arid,
montane, tropical and subtropical closed and open systems, e.g. forests and
savanna/grassland. The model has the potential for accurate reconstructions of
past biomes and awaits application to fossil pollen sequences. The Random Forest
model may be used to investigate vegetation changes on both long and short time
scales, e.g. during glacial and interglacial cycles, or more recent and abrupt
climatic anomalies like the African Humid Period. Such applications may
contribute to a better understanding of past shifts in vegetation cover and
ultimately provide valuable information on drivers of climate change.