Coreference resolution systems aim to recognize and cluster together
mentions of the same underlying entity. While there exist large amounts of
research on broadly spoken languages such as English and Chinese, research
on coreference in other languages is comparably scarce. In this work we
first present SentiCoref 1.0 - a coreference resolution dataset for
Slovene language that is comparable to English-based corpora. Further, we
conduct a series of analyses using various complex models that range from
simple linear models to current state-of-the-art deep neural coreference
approaches leveraging pre-trained contextual embeddings. Apart from
SentiCoref, we evaluate models also on a smaller coref149 Slovene dataset to
justify the creation of a new corpus. We investigate robustness of the
models using cross-domain data and data augmentations. Models using
contextual embeddings achieve the best results - up to 0.92 average F1
score for the SentiCoref dataset. Cross-domain experiments indicate that
SentiCoref allows the models to learn more general patterns, which
enables them to outperform models, learned on coref149 only.