Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017
DOI: 10.18653/v1/s17-2028
|View full text |Cite
|
Sign up to set email alerts
|

ECNU at SemEval-2017 Task 1: Leverage Kernel-based Traditional NLP features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity

Abstract: To model semantic similarity for multilingual and cross-lingual sentence pairs, we first translate foreign languages into English, and then build an efficient monolingual English system with multiple NLP features. Our system is further supported by deep learning models and our best run achieves the mean Pearson correlation 73.16% in primary track.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
55
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 64 publications
(58 citation statements)
references
References 13 publications
3
55
0
Order By: Relevance
“…ECNU (Tian et al, 2017) The best overall system is from ENCU and ensembles well performing a feature engineered models with deep learning methods. Three feature engineered models use Random Forest (RF), Gradient Boosting (GB) and XGBoost (XGB) regression methods with features based on: n-gram overlap; edit distance; longest common prefix/suffix/substring; tree kernels (Moschitti, 2006); word alignments (Sultan et al, 2015); summarization and MT evaluation metrics (BLEU, GTM-3, NIST, WER, ME-TEOR, ROUGE); and kernel similarity of bagsof-words, bags-of-dependencies and pooled wordembeddings.…”
Section: Methodsmentioning
confidence: 99%
“…ECNU (Tian et al, 2017) The best overall system is from ENCU and ensembles well performing a feature engineered models with deep learning methods. Three feature engineered models use Random Forest (RF), Gradient Boosting (GB) and XGBoost (XGB) regression methods with features based on: n-gram overlap; edit distance; longest common prefix/suffix/substring; tree kernels (Moschitti, 2006); word alignments (Sultan et al, 2015); summarization and MT evaluation metrics (BLEU, GTM-3, NIST, WER, ME-TEOR, ROUGE); and kernel similarity of bagsof-words, bags-of-dependencies and pooled wordembeddings.…”
Section: Methodsmentioning
confidence: 99%
“…We use this P M I score to evaluate partitions without requiring a labelled ground truth.The P M I score has been shown to perform well [14,15] when compared to human interpretation of topics on different corpora [40,41], and is designed to evaluate topical coherence for groups of documents, in contrast to other tools aimed at short forms of text. See [19,20,42,43] for other examples.…”
Section: Quantitative Benchmarking Of Topic Clustersmentioning
confidence: 99%
“…We compared our optimal results with the three best systems proposed in SemEval-2017 Arabic-English cross-lingual evaluation task [8] (ECNU [40], BIT [44] and HCTI [38]) and the baseline system [8]. In this evaluation, ECNU obtained the best performance with a correlation score of 74.93%, followed by BIT and HCTI with 70.07% and 68.36% respectively.…”
Section: Comparison With Semeval-2017 Winnersmentioning
confidence: 99%