2019
DOI: 10.1186/s13326-019-0200-x
|View full text |Cite
|
Sign up to set email alerts
|

Similarity corpus on microbial transcriptional regulation

Abstract: Background The ability to express the same meaning in different ways is a well-known property of natural language. This amazing property is the source of major difficulties in natural language processing. Given the constant increase in published literature, its curation and information extraction would strongly benefit from efficient automatic processes, for which corpora of sentences evaluated by experts are a valuable resource. Results Given our interest in applying s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
21
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(21 citation statements)
references
References 28 publications
(23 reference statements)
0
21
0
Order By: Relevance
“…On the other hand, the development of sentence similarity benchmarks for the biomedical domain is much more recent. Currently, there are only three datasets for the evaluation of methods on biomedical sentence similarity, called BIOSSES [ 20 ], MedSTS [ 49 ], and CTR [ 50 ]. BIOSSES was introduced in 2017 and it is limited to 100 sentence pairs with their corresponding similarity scores, whilst MedSTS full is made up by 1,068 scored sentence pairs of the MedSTS dataset [ 100 ], which contains 174,629 sentence pairs gathered from a clinical corpus on biomedical sentence similarity.…”
Section: Methods On Sentence Semantic Similaritymentioning
confidence: 99%
See 2 more Smart Citations
“…On the other hand, the development of sentence similarity benchmarks for the biomedical domain is much more recent. Currently, there are only three datasets for the evaluation of methods on biomedical sentence similarity, called BIOSSES [ 20 ], MedSTS [ 49 ], and CTR [ 50 ]. BIOSSES was introduced in 2017 and it is limited to 100 sentence pairs with their corresponding similarity scores, whilst MedSTS full is made up by 1,068 scored sentence pairs of the MedSTS dataset [ 100 ], which contains 174,629 sentence pairs gathered from a clinical corpus on biomedical sentence similarity.…”
Section: Methods On Sentence Semantic Similaritymentioning
confidence: 99%
“…Fig 3 shows the workflow for running the experiments that will be carried out for this work. Given an input dataset, such as BIOSSES [ 20 ], MedSTS [ 49 ], or CTR [ 50 ], the first step is to pre-process all of the sentences, as shown in Fig 4 . For each sentence in the dataset (named S1 and S2), the preprocessing phase will be divided into four stages as follows: (1.a) named entity recognition of UMLS [ 120 ] concepts, using different state-of-the-art NER tools, such as MetaMap [ 107 ] or cTAKES [ 108 ]; (1.b) tokenize the sentence, using well-known tokenizers, such as the Stanford CoreNLP tokenizer [ 117 ], BioCNLPTokenizer [ 118 ], or WordPieceTokenizer [ 33 ] for BERT-based methods; (1.c) lower-case normalization; (1.d) character filtering, which allows the removal of punctuation marks or special characters; and finally, (1.e) the removal of stop-words, following different approximations evaluated by other authors like Blagec et al [ 28 ] or Sogancioglu et al [ 20 ].…”
Section: The Reproducible Experiments On Biomedical Sentence Similaritymentioning
confidence: 99%
See 1 more Smart Citation
“…The quality of the semantic-interlinks was evaluated indirectly by 303 measuring the precision of the similarity metrics. This was done through a 304 10-fold cross-validation over a Similarity Corpus [13], which is a graded 305 textual similarity corpus that was specifically designed to be used for 306 training similarity models.…”
Section: Quality Of the Semantic Links 302mentioning
confidence: 99%
“…10 Protein-protein interactions, regulatory interactions identification, entity 11 association to ontologies, or even directed searches, are just some of aided 12 curation examples [6,17,23]. It is important to emphasize that they are 13 focused in facilitating access to specific information patterns. This kind of 14 curation is vital and very helpful but it is designed on the premise that 15 what is searched fits in a set of predefined and foreseen model and criteria, 16 thus it only targets a fraction of the potential knowledge.…”
mentioning
confidence: 99%