The presence of large-scale corpora for Natural Language Inference (NLI) has spurred deep learning research in this area, though much of this research has focused solely on monolingual data. Code-mixing is the intertwined usage of multiple languages, and is commonly seen in informal conversations among polyglots. Given the rising importance of dialogue agents, it is imperative that they understand code-mixing, but the scarcity of code-mixed Natural Language Understanding (NLU) datasets has precluded research in this area. The dataset by Khanuja et al. (2020a) for detecting conversational entailment in codemixed Hindi-English text is the first of its kind. We investigate the effectiveness of language modeling, data augmentation, translation, and architectural approaches to address the codemixed, conversational, and low-resource aspects of this dataset. We obtain +8.09% test set accuracy over the current state of the art.
In academic publications, citations are used to build context for a concept by highlighting relevant aspects from reference papers. Automatically identifying referenced snippets can help researchers swiftly isolate principal contributions of scientific works. In this paper, we exploit the underlying structure of scientific articles to predict reference paper spans and facets corresponding to a citation. We propose two methods to detect citation spanskeyphrase overlap, BERT along with structural priors. We fine-tune FastText embeddings and leverage textual, positional features to predict citation facets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.