BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering.In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods. 1
Molecular prognostic indicators for oropharyngeal squamous cell carcinoma (OSCC), including HPV-DNA detection, epidermal growth factor receptor (EGFR) and p16 expression, have been suggested in the literature, but none of these are currently used in clinical practice. To compare these predictors, 106 newly diagnosed OSCC for the presence of HPV-DNA and expression of p16 and EGFR were analyzed. The 5-year disease-free survival (DFS) and overall survival (OS) were calculated in relation to these markers and a multivariate Cox analysis was performed. Twentyeight percent of the cases contained oncogenic HPV-DNA and 30% were positive for p16. The p16 expression was highly correlated with the presence of HPV-DNA (p < 0.001). Univariate analysis of the 5-year DFS revealed a significantly better outcome for patients with p16-positive tumors (84% vs. 49%, p 5 0.009). EGFR-negative tumors showed a tendency toward a better prognosis in DFS (74% vs. 47%, p 5 0.084) and OS (70% vs. 45%, p 5 0.100). Remarkable and highly significant was the combination of p16 and EGFR expression status, leading to 5-year DFS of 93% for p161/EGFR2 tumors vs. 39% for p162/EGFR1 tumors (p 5 0.003) and to a 5-year OS of 79% vs. 38%, respectively (p 5 0.010). In multivariate analysis p16 remained a highly significant prognostic marker for DFS (p 5 0.030) showing a 7.5-fold increased risk for relapse in patients with p16-negative tumors. Our data indicate that p16 expression is the most reliable prognostic marker for OSCC and further might be a surrogate marker for HPV-positive OSCC. HPV1/p161 tumors tended to have decreased EGFR expression, but using both immunohistological markers has significant prognostic implications. ' 2007 Wiley-Liss, Inc.
In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches. We demonstrate for common sequence tagging tasks that the seed value for the random number generator can result in statistically significant (p < 10 −4 ) differences for state-of-the-art systems. For two recent systems for NER, we observe an absolute difference of one percentage point F 1 -score depending on the selected seed value, making these systems perceived either as state-of-the-art or mediocre. Instead of publishing and reporting single performance scores, we propose to compare score distributions based on multiple executions. Based on the evaluation of 50.000 LSTMnetworks for five sequence tagging tasks, we present network architectures that produce both superior performance as well as are more stable with respect to the remaining hyperparameters. The full experimental results are published in (Reimers and Gurevych, 2017).
We present an easy and efficient method to extend existing sentence embedding models to new languages. This allows to create multilingual versions from previously monolingual models. The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence. We use the original (monolingual) model to generate sentence embeddings for the source language and then train a new system on translated sentences to mimic the original model. Compared to other methods for training multilingual sentence embeddings, this approach has several advantages: It is easy to extend existing models with relatively few samples to new languages, it is easier to ensure desired properties for the vector space, and the hardware requirements for training are lower. We demonstrate the effectiveness of our approach for 50+ languages from various language families. Code to extend sentence embeddings models to more than 400 languages is publicly available. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.