Haruo Shindo scite author profile

Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-theart accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset.

show abstract

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Yamada

et al. 2020

View full text Add to dashboard Cite

Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer (Vaswani et al., 2017). The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT (Devlin et al., 2019). The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at https: //github.com/studio-ousia/luke. IntroductionMany natural language tasks involve entities, e.g., relation classification, entity typing, named entity recognition (NER), and question answering (QA). Key to solving such entity-related tasks is a model to learn the effective representations of entities. Conventional entity representations assign each entity a fixed embedding vector that stores information regarding the entity in a knowledge base (KB)

show abstract

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Suzuki²,

et al. 2018

View full text Add to dashboard Cite

Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance 1 .

show abstract

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

et al. 2020

View full text Add to dashboard Cite

The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Pythonbased open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings.In our experiments, our tool achieved a stateof-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at https://wikipedia2vec.github.io.

show abstract

Role of Mn(IV) oxide in abiotic formation of humic substances in the environment

Shindo

Huang

1982

Nature

177

View full text Add to dashboard Cite

Catalytic Effects of Manganese (IV), Iron(III), Aluminum, and Silicon Oxides on the Formation of Phenolic Polymers

Shindo¹,

Huang

1984

Soil Science Soc of Amer J

159

View full text Add to dashboard Cite

The catalytic effects of three Mn(IV) oxides (birnessite, cryptomelane, and pyrolusite) and short‐range ordered Fe(III), Al, and Si oxides on the darkening of phenolic compound solutions (hydroquinone, resorcinol, and catechol) and the subsequent formation of humic acid (the precipitate formed by acidifying the darkening solution) were investigated. Manganese oxides are very powerful in causing the darkening of phenolic compounds. The rate and degree of the darkening vary with the kinds of Mn oxides, the chemistry of the phenolic compounds, and the pH values of the systems. In the Mn oxide systems, phenolic compounds are converted to humic acid with a relatively high degree of humification (Δlog K: 0.52–0.70; RF: 54–105) through oxidative polymerization. For example, at the initial pH of 6.0, the percent conversion of hydroquinone to humic acid ranged from 36 to 55%. The yields of humic acids formed in the Mn oxide systems are highly correlated with the degree of the darkening measured at 400 nm (r2 = 0.818) and 600 nm (r2 = 0.983). The catalytic effect of Fe oxide on the darkening and the formation of humic acid was relatively limited under the conditions studied. The catalytic effects were not observed in the Al and Si oxide systems. The results obtained in this study indicate that various Mn(IV) oxides common in the environment merit close attention in the abiotic formation of humic substances.

show abstract

Learning Distributed Representations of Texts and Entities from Knowledge Base

Yamada

Shindo

Takeda

et al. 2017

TACL

View full text Add to dashboard Cite

We describe a neural network model that jointly learns distributed representations of texts and knowledge base (KB) entities. Given a text in the KB, we train our proposed model to predict entities that are relevant to the text. Our model is designed to be generic with the ability to address various NLP tasks with ease. We train the model using a large corpus of texts and their entity annotations extracted from Wikipedia. We evaluated the model on three important NLP tasks (i.e., sentence textual similarity, entity linking, and factoid question answering) involving both unsupervised and supervised settings. As a result, we achieved state-of-the-art results on all three of these tasks. Our code and trained models are publicly available for further academic research. 1 EntityOur model Skip-gram Europe Eastern Europe (0.67) Western Europe (0.66) Central Europe (0.64) Asia (0.64) North America (0.64) Asia (0.85) Western Europe (0.78) North America (0.76) Central Europe (0.75) Americas (0.73) Golf Golf course (0.76) PGA Tour (0.74) LPGA (0.74) Professional golfer (0.73) U.S. Open (0.71) Tennis (0.74) LPGA (0.72) PGA Tour (0.69) Golf course (0.68) Nicklaus Design (0.66) Tea Coffee (0.82) Green tea (0.81) Black tea (0.80) Camellia sinensis (0.78) Spice (0.76) Coffee (0.78) Green tea (0.76) Black tea (0.75) Camellia sinensis (0.74) Spice (0.73) Smartphone Tablet computer (0.93) Mobile device (0.89) Personal digital assistant (0.88) Android (operating system) (0.86) iPhone (0.85) Tablet computer (0.91) Personal digital assistant (0.84) Mobile device (0.84) Android (operating system) (0.82) Feature phone (0.82) Scarlett Johansson Kirsten Dunst (0.85) Anne Hathaway (0.85) Cameron Diaz (0.85) Natalie Portman (0.85) Jessica Biel (0.84) Anne Hathaway (0.79) Natalie Portman (0.78) Kirsten Dunst (0.78) Cameron Diaz (0.78) Kate Beckinsale (0.77) The Lord of the Rings The Hobbit (0.85) J. R. R. Tolkien (0.84) The Silmarillion (0.81) The Fellowship of the Ring (0.80) The Lord of the Rings (film series) (0.78) The Hobbit (0.77) J. R. R. Tolkien (0.76) The Silmarillion (0.71) The Fellowship of the Ring (0.70) Elvish languages (0.69) Table 7: Examples of top five similar entities with their cosine similarities in our learned entity representations with those of the skip-gram model.

show abstract

A Span Selection Model for Semantic Role Labeling

Ouchi¹,

Shindo²,

Matsumoto³

2018

View full text Add to dashboard Cite

We present a simple and accurate span-based model for semantic role labeling (SRL). Our model directly takes into account all possible argument spans and scores them for each label. At decoding time, we greedily select higher scoring labeled spans. One advantage of our model is to allow us to design and use spanlevel features, that are difficult to use in tokenbased BIO tagging approaches. Experimental results demonstrate that our ensemble model achieves the state-of-the-art results, 87.4 F1 and 87.0 F1 on the CoNLL-2005 and 2012 datasets, respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haruo Shindo

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Role of Mn(IV) oxide in abiotic formation of humic substances in the environment

Catalytic Effects of Manganese (IV), Iron(III), Aluminum, and Silicon Oxides on the Formation of Phenolic Polymers

Learning Distributed Representations of Texts and Entities from Knowledge Base

A Span Selection Model for Semantic Role Labeling

Contact Info

Product

Resources

About