“…Therefore, the main challenge of detecting natural hazards from textual contents on social media is to identify unseen risks with low resources for training. We reuse the labeled tweets produced by ChouBERT (Jiang et al, 2022 ), tweets about corn borer, barley yellow dwarf virus (BYDV) and corvids for training and validation, and tweets about unseen and polysemous terms such as “taupin” (wireworm in English) for testing the generalizability of the classifier. Since the binary cross entropy loss adopted by the discriminator of GAN-BERT favors the majority class when data are unbalanced, for the different training experiments, we sampled ChouBERT's training data to 16, 32, 64, 128, 256, and 512 subsets, each subset having equal number of observations and non-observations.…”