We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero-and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset 1 matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements.
In the paper we present an algorithm called GameRank, modified from Pagerank and HITS, to evaluate the pitching and batting ability for players in Major League Baseball (MLB) with a network perspective. The model could also be easily expanded and applied on any network that has multiple factors interacting with each other, to quantify the vertex's significance. Then, we evaluate the algorithm by comparing its results to ESPN Ratings, a popular baseball rating method. Our algorithm achieves similar or better results with a way simpler model. Furthermore, relevant analysis is also performed for our MLB data network, with a few interesting conclusions drawn, like (a) players are getting closer in their skills; (b) good pitchers bats better than normal ones.What's more, we have wrapped up the whole system as a working website, called MLB Illustrator (http://mlbillustrator.com), to let users interact with the data and network itself, making the traditional baseball statistics analysis based on tables and simple graphs evolve into intuitive visualized network analysis. At last, we present a series of examples where GameRank model can be used, to prove that our model is extensive and widely applicable.Our contribution lies in the following aspects: (a) we provide a simple model to rank the nodes in networks with multiple indicators interplaying with each other, which expands the functionality of PageRank, and is widely applicable; (b) we initially apply the network theory on the baseball network, handle a set of analysis on it, and have some interesting findings; (c) we provide a powerful method to rank baseball players which is stronger than ESPN Ratings in several aspects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.