Word Embeddings for Entity-Annotated Texts

Almasian, Satya; Spitz, Andreas; Gertz, Michael

doi:10.1007/978-3-030-15712-8_20

Cited by 6 publications

(10 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Toutanova et al (2015) extract dependency paths from sentences and jointly embed them with a KG using DistMult (Yang et al, 2015) to support the relation extraction task. Several other approaches focus on jointly embedding words, entities (Yamada et al, 2017;Newman-Griffis et al, 2018;Cao et al, 2017;Almasian et al, 2019) and entity types (Gupta et al, 2017) appearing in the same textual contexts without considering relational structure of a KG. These ap-proaches are employed in monolingual NLP tasks including entity linking (Gupta et al, 2017;Cao et al, 2017), entity abstraction (Newman-Griffis et al, 2018) and factoid QA (Yamada et al, 2017).…”

Section: Related Workmentioning

confidence: 99%

Cross-lingual Entity Alignment with Incidental Supervision

Chen

Shi

Zhou³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

Much research effort has been put to multilingual knowledge graph (KG) embedding methods to address the entity alignment task, which seeks to match entities in different languagespecific KGs that refer to the same real-world object. Such methods are often hindered by the insufficiency of seed alignment provided between KGs. Therefore, we propose an incidentally supervised model, JEANS , which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text. JEANS first deploys an entity grounding process to combine each KG with the monolingual text corpus. Then, two learning processes are conducted: (i) an embedding learning process to encode the KG and text of each language in one embedding space, and (ii) a selflearning based alignment learning process to iteratively induce the matching of entities and that of lexemes between embeddings. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision, and significantly outperforms state-of-the-art methods that solely rely on internal information of KGs. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Cross-lingual Entity Alignment with Incidental Supervision

Chen

Shi

Zhou³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

show abstract

“…By annotating the text with named entity information before training the model, unique multi-word entries in the dictionary directly relate to known entities. Almasian et al propose such a model for entity-annotated texts [2]. Other interesting approaches build networks of co-occurring words and entities.…”

Section: Semantic Exploration Using Visualizationsmentioning

confidence: 99%

“…Auditing firms and law enforcement need to sift through massive amounts of data to gather evidence of criminal activity, often involving communication networks and documents [28]. Current computer-aided exploration tools, 2 offer a wide range of features from data ingestion, exploration, analysis, to visualization. This way, users can quickly navigate the underlying data based on extracted attributes, which would otherwise be infeasible due to the often large amount of heterogeneous data.…”

Section: Introductionmentioning

confidence: 99%

Extraction and Representation of Financial Entities from Text

Repke

Krestel

2021

Data Science for Economics and Finance

View full text Add to dashboard Cite

In our modern society, almost all events, processes, and decisions in a corporation are documented by internal written communication, legal filings, or business and financial news. The valuable knowledge in such collections is not directly accessible by computers as they mostly consist of unstructured text. This chapter provides an overview of corpora commonly used in research and highlights related work and state-of-the-art approaches to extract and represent financial entities and relations.The second part of this chapter considers applications based on knowledge graphs of automatically extracted facts. Traditional information retrieval systems typically require the user to have prior knowledge of the data. Suitable visualization techniques can overcome this requirement and enable users to explore large sets of documents. Furthermore, data mining techniques can be used to enrich or filter knowledge graphs. This information can augment source documents and guide exploration processes. Systems for document exploration are tailored to specific tasks, such as investigative work in audits or legal discovery, monitoring compliance, or providing information in a retrieval system to support decisions.

show abstract

“…. , m N } be the set of N mentions contained in D, and E be the set of entities in the reference KG G. A low-dimensional representation (embedding) can be learned for each entity by applying node representation learning techniques such as DEEPWALK (Perozzi et al, 2014) to the graph G. The entity embeddings learned by these techniques are known to be meaningful with respect to the relatedness of the entities they represent (Almasian et al, 2019). The EL task consists in finding, for each mention m ∈ M D , the entity e ∈ E to which it refers.…”

Section: Problem and Notationmentioning

confidence: 99%

“…If the number of eigenthemes is chosen to be small, these components will constitute a good basis to approximate the dense region of the document embedding matrix. From the fundamental assumption of the existence of topical relatedness across the gold entities in a document and that such relatedness is captured by their corresponding embeddings (Almasian et al, 2019), the gold entities will form a dense region and, consequently, will define the subspace. However, this is only possible if there is no other subset of candidate entities whose relatedness is larger than that of the set of gold entities.…”

Section: Entity Linking With Eigenthemesmentioning

confidence: 99%

Low-Rank Subspaces for Unsupervised Entity Linking

Arora,

García-Durán,

West

2021

Preprint

View full text Add to dashboard Cite

Entity linking is an important problem with many applications. Most previous solutions were designed for settings where annotated training data is available, which is, however, not the case in numerous domains. We propose a light-weight and scalable entity linking method, EIGENTHEMES, that relies solely on the availability of entity names and a referent knowledge base. EIGENTHEMES exploits the fact that the entities that are truly mentioned in a document (the "gold entities") tend to form a semantically dense subset of the set of all candidate entities in the document. Geometrically speaking, when representing entities as vectors via some given embedding, the gold entities tend to lie in a low-rank subspace of the full embedding space. EIGENTHEMES identifies this subspace using the singular value decomposition and scores candidate entities according to their proximity to the subspace. On the empirical front, we introduce multiple strong baselines that compare favorably to the existing state of the art. Extensive experiments on benchmark datasets from a variety of realworld domains showcase the effectiveness of our approach.

show abstract

Word Embeddings for Entity-Annotated Texts

Cited by 6 publications

References 36 publications

Cross-lingual Entity Alignment with Incidental Supervision

Cross-lingual Entity Alignment with Incidental Supervision

Extraction and Representation of Financial Entities from Text

Low-Rank Subspaces for Unsupervised Entity Linking

Contact Info

Product

Resources

About