BioNLP 2017 2017
DOI: 10.18653/v1/w17-2303
|View full text |Cite
|
Sign up to set email alerts
|

Insights into Analogy Completion from the Biomedical Domain

Abstract: Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains. Through an analysis of analogies in the biomedical domain, we identify three assumptions: that of a Single Answer for any given analogy, that the pairs involved describe the Same Relationship, and that each pair is Informat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(15 citation statements)
references
References 19 publications
0
15
0
Order By: Relevance
“…It appears that not all relations can be identified in this way, with lexical semantic relations such as synonymy and antonymy being particularly difficult (Köper et al, 2015;Vylomova et al, 2016). The assumption of a single best-fitting candidate answer is also being targeted (Newman-Griffis et al, 2017).…”
Section: Resultsmentioning
confidence: 99%
“…It appears that not all relations can be identified in this way, with lexical semantic relations such as synonymy and antonymy being particularly difficult (Köper et al, 2015;Vylomova et al, 2016). The assumption of a single best-fitting candidate answer is also being targeted (Newman-Griffis et al, 2017).…”
Section: Resultsmentioning
confidence: 99%
“…Pyysalo et al [73] train a Skip-gram [72] model on document titles and abstracts from the PubMed XML dataset, and all text content of the PMC Open Access dataset. Newman-Griffis et al [70] and Chen et al [71] train GloVe [69], Skip-gram, and Continuous Bag of Words (CBOW) [72] models using PubMed information, whilst Zhang et al [22] and Chen et al [71] train FastText [23] models using PubMed and MeSH. Blagec et al [28] introduce a set of neural embedding models based on the training of FastText [23], Sent2Vec [26], Paragraph vector [29], and Skip-thoughts vectors [30] models on the PMC dataset.…”
Section: Methods Proposed For the Biomedical Domainmentioning
confidence: 99%
“…Availability of the pre-trained models. We have already gathered all the pre-trained embeddings [22,25,70,71,73,77,114,115] and BERT-based language models [31,32,[79][80][81]116] required for our experiments. We have also checked the validity of all pre-trained model files by testing the evaluation of the models using the third-party libraries as detailed below.…”
Section: Integration Of the Biomedical Ontologies And Thesaurus Recently Published Hesml V1r5mentioning
confidence: 99%
“…Second, we used a previously reported analogy completion task [22]. For every combination of pairs, representing an analogy a : b :: c : d , we calculated the cosine similarity between d and the single closest vocabulary word in the embedding space to a − b + c .…”
Section: Methodsmentioning
confidence: 99%