Although the vast majority of knowledge bases (KBs) are heavily biased towards English, Wikipedias do cover very different topics in different languages. Exploiting this, we introduce a new multilingual dataset (X-WikiRE), framing relation extraction as a multilingual machine reading problem. We show that by leveraging this resource it is possible to robustly transfer models cross-lingually and that multilingual support significantly improves (zero-shot) relation extraction, enabling the population of low-resourced KBs from their well-populated counterparts.
Software repository hosting services contain large amounts of open-source software, with GitHub hosting over 200 million repositories, from new to established ones. However, these repositories are not easy to find, calling for various attempts to classify their application domains automatically. However, most proposed approaches use artifacts, like README files, as a proxy for the project, losing the information in the source code and the interaction between files. Furthermore, they all focus on the project-level, ignoring the decomposition of software projects into components and modules.This work presents a weak labelling approach based on keyword extraction to annotate source files in a software project.Our findings suggest that using keywords to perform file-level annotations is an effective approach that can capture enough information from the source file so that new labels can be predicted.The long-term goal of our research is to classify source code files and use these annotations to identify semantic components in software projects. In addition, these annotations can be used for semantic reverse engineering, software reuse, and more. We plan to train machine learning models that use our proposed weak supervision to better annotate source files inside software projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.