Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) 2014
DOI: 10.3115/v1/s14-2069
|View full text |Cite
|
Sign up to set email alerts
|

LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient

Abstract: This paper describes the system used by the LIPN team in the task 10, Multilingual Semantic Textual Similarity, at SemEval 2014, in both the English and Spanish sub-tasks. The system uses a support vector regression model, combining different text similarity measures as features. With respect to our 2013 participation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya distance calculated on cooccurrence distributions derived from the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 3 publications
(2 reference statements)
0
4
0
Order By: Relevance
“…Table 3 shows the results of the English subtask, with runs listed in alphabetical order. The correlation in each dataset is given, followed 11 Participating teams: Bielefeld SC (McCrae et al, 2013), BUAP (Vilariño et al, 2014), DLS@CU (Sultan et al, 2014b), FBK-TR (Vo et al, 2014), IBM EG (no information), LIPN (Buscaldi et al, 2014), Meerkat Mafia (Kashyap et al, 2014), NTNU (Lynum et al, 2014), RTM-DCU (Biçici and Way, 2014), SemantiKLUE (Proisi et al, 2014), StanfordNLP (Socher et al, 2014), TeamZ (Gupta, 2014), UMCC DLSI SemSim (Chavez et al, 2014), UNAL-NLP , UNED (Martinez-Romo et al, 2011), UoW (Rios, 2014 Table 3: English evaluation results. Results at the top correspond to out-of-the-box systems.…”
Section: English Subtaskmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 3 shows the results of the English subtask, with runs listed in alphabetical order. The correlation in each dataset is given, followed 11 Participating teams: Bielefeld SC (McCrae et al, 2013), BUAP (Vilariño et al, 2014), DLS@CU (Sultan et al, 2014b), FBK-TR (Vo et al, 2014), IBM EG (no information), LIPN (Buscaldi et al, 2014), Meerkat Mafia (Kashyap et al, 2014), NTNU (Lynum et al, 2014), RTM-DCU (Biçici and Way, 2014), SemantiKLUE (Proisi et al, 2014), StanfordNLP (Socher et al, 2014), TeamZ (Gupta, 2014), UMCC DLSI SemSim (Chavez et al, 2014), UNAL-NLP , UNED (Martinez-Romo et al, 2011), UoW (Rios, 2014 Table 3: English evaluation results. Results at the top correspond to out-of-the-box systems.…”
Section: English Subtaskmentioning
confidence: 99%
“…Overall, most systems were cross-lingual, relying on different translation approaches, such as 1) translating the test data into English (as the two systems above), and then exporting the score obtained for the English sentences back to Spanish, or 2) performing automatic translation of the English training data, and learning a classifier directly in Spanish. (Buscaldi et al, 2014) supplemented their training dataset with human annotations conducted in Spanish, using definition pairs extracted from a Spanish dictionary. A different angle was explored by (Rios, 2014), who proposed a multilingual framework using transfer learning across English and Spanish by training on traditional lexical, knowledge-based and corpus-based features.…”
Section: Spanish Subtaskmentioning
confidence: 99%
“…And select the utterance that has a high n-gram overlap score, and add it to the set of diverse utterances. iii) Iterate over the remaining set of generated utterances, in each iteration compute the ngram similarity score (Buscaldi et al 2013) between the current utterance and set of diverse utterances and based on the computed scores the utterance with least n-gram similarity is added to the set of diverse utterances. Also, during each iteration we check for the stopping criteria i.e.…”
Section: Figure 6: Llm Promptsmentioning
confidence: 99%
“…Our participation in SemEval 2015 was focused on solving the technical problems that afflicted our previous participation (Buscaldi et al, 2014) and including additional features based on alignments, such as the Sultan similarity (Sultan et al, 2014b) and the measure available in CMU Sphinx-4 (Lamere et al, 2003) for speech recognition. We baptised the new system SOPA from the Spanish word for "soup", since it uses a heterogeneous mix of features.…”
Section: Introductionmentioning
confidence: 99%