Proceedings of the Third Workshop on Representation Learning for NLP 2018
DOI: 10.18653/v1/w18-3026
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Embedding Entities and Text with Distant Supervision

Abstract: Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 30 publications
(32 citation statements)
references
References 27 publications
1
31
0
Order By: Relevance
“…In addition, there is significant research into strategies for learning neural representations of entities in knowledge bases and coding systems. Past work has investigated diverse approaches, such as leveraging rich semantic information from knowledge base structure and web-scale annotated corpora (34,97,98), utilizing definitions of word senses (similar to our use of ICF definitions) (99,100), and combining terminologies with targeted selection of training corpora to learn applicationtailored concept representations (101,102). While most of the research on entity representations requires resources not yet available for FSI (e.g., large, annotated corpora; well-developed terminologies; robust and interconnected knowledge graph structure), all present significant opportunities to advance FSI coding technologies as more resources are developed.…”
Section: Alternative Coding Approachesmentioning
confidence: 99%
“…In addition, there is significant research into strategies for learning neural representations of entities in knowledge bases and coding systems. Past work has investigated diverse approaches, such as leveraging rich semantic information from knowledge base structure and web-scale annotated corpora (34,97,98), utilizing definitions of word senses (similar to our use of ICF definitions) (99,100), and combining terminologies with targeted selection of training corpora to learn applicationtailored concept representations (101,102). While most of the research on entity representations requires resources not yet available for FSI (e.g., large, annotated corpora; well-developed terminologies; robust and interconnected knowledge graph structure), all present significant opportunities to advance FSI coding technologies as more resources are developed.…”
Section: Alternative Coding Approachesmentioning
confidence: 99%
“…Several studies [37,39,46] proposed a hybrid between entity and word embeddings by employing a loss function, which includes both a TransE-based component to model relations between entities and a word2vec-based component to model semantic relations between the words along with the third component, whose purpose is to align entity and word embeddings obtained by the first two components. In [23] authors take a different approach by learning word and entity embeddings without utilizing relations between entities from a knowledge graph and instead relying only on an unannotated corpus of text. None of the previously proposed approaches for learning joint word and entity embedding spaces were proposed specifically for entity search in a knowledge graph, and thus ignore important information, such as knowledge graph structural components.…”
Section: Related Workmentioning
confidence: 99%
“…SPECTER paper embeddings have been shown to successfully capture paper similarity [ 18 ] and are available for all papers in CORD-19. Also available for papers in CORD-19 are clinical concept embeddings trained using the JET algorithm [ 60 ], relation embeddings trained using SeVeN [ 28 ] and network co-occurrence embeddings [ 63 ] for biomedical entities computed using CORD-19-on-FHIR. Embeddings capture text similarity and can be used to retrieve similar texts, e.g.…”
Section: Text Mining Modeling Resourcesmentioning
confidence: 99%