Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 2020
DOI: 10.18653/v1/2020.acl-demos.12
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Universal Sentence Encoder for Semantic Retrieval

Abstract: We present easy-to-use retrieval focused multilingual sentence embedding models, made available on TensorFlow Hub. The models embed text from 16 languages into a shared semantic space using a multi-task trained dualencoder that learns tied cross-lingual representations via translation bridge tasks (Chidambaram et al., 2018). The models achieve a new state-of-the-art in performance on monolingual and cross-lingual semantic retrieval (SR). Competitive performance is obtained on the related tasks of translation p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
128
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 230 publications
(161 citation statements)
references
References 13 publications
(29 reference statements)
0
128
0
1
Order By: Relevance
“…We hypothesized that part of this encoded meaning includes information about compounds mentioned within text, and that we could map a compound into semantic space by calculating the centroid of the document vectors in which it occurs ( Figure 1A). To first assemble document embeddings, we employed the pre-trained Universal Sentence Encoder multilingual model on ~14.6M Pubmed abstracts [11]. We assessed the quality of the document embeddings using a derived Information Retrieval task, and obtained an accuracy of 90.5%, a 26.4% improvement over a bag-of-words approach (Supplementary Table 1).…”
Section: Resultsmentioning
confidence: 99%
“…We hypothesized that part of this encoded meaning includes information about compounds mentioned within text, and that we could map a compound into semantic space by calculating the centroid of the document vectors in which it occurs ( Figure 1A). To first assemble document embeddings, we employed the pre-trained Universal Sentence Encoder multilingual model on ~14.6M Pubmed abstracts [11]. We assessed the quality of the document embeddings using a derived Information Retrieval task, and obtained an accuracy of 90.5%, a 26.4% improvement over a bag-of-words approach (Supplementary Table 1).…”
Section: Resultsmentioning
confidence: 99%
“…In automated approaches, we used three types of text representation: bag of words, dictionary vectors (POS + LCM + Sentiment), and utterance embedding vectors obtained from the Universal Sentence Encoder (USE) [44]. We then used those representations with multiple types of algorithms to classify texts:…”
Section: Methodsmentioning
confidence: 99%
“…One of the methods used to obtain text representations was the USE [44]. This deep neural network text encoder supports 16 languages, among them Polish.…”
Section: Universal Sentence Encoder (Use) Text Representationsmentioning
confidence: 99%
“…Despite its well-known criticalities, Achinstein's model seemed to us the most suitable among all for building a user-centric explanatory software, by suggesting that explaining is akin to Question Answering (QA). On this aspect, one of the main technological limitations of state-of-the-art automated QA [24,31] is that it tends to lose its effectiveness when the questions are too broad. This results in several issues when the user has no knowledge of the domain, thus forcing him/her to resort to generic questions in order to acquire enough information to be able to approach more specific questions.…”
Section: Proposed Solutionmentioning
confidence: 99%
“…Despite its criticalities, Achinstein's theory seemed to us the most suitable among all, for our purposes, for allowing the assessment of the quality of an explanation/answer on the base of its pragmatic relevance to a question. A task that may seem too onerous and subjective, nonetheless recent developments in modern artificial intelligence have shown there might exist tools [24,31] to objectively estimate the pertinence of an answer thus allowing for the automation of a question answering process.…”
Section: Introductionmentioning
confidence: 99%