Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstratio 2016
DOI: 10.18653/v1/n16-3003
|View full text |Cite
|
Sign up to set email alerts
|

Farasa: A Fast and Furious Segmenter for Arabic

Abstract: In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is based on SVM-rank using linear kernels. We measure the performance of the segmenter in terms of accuracy and efficiency, in two NLP tasks, namely Machine Translation (MT) and Information Retrieval (IR). Farasa outperforms or is at par with the stateof-the-art Arabic segmenters (Stanford and MADAMIRA), while being more than one order of magnitude faster.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
186
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 248 publications
(186 citation statements)
references
References 13 publications
0
186
0
Order By: Relevance
“…This work presents two open source state-of-theart POS tagging systems that are trained using standard ATB dataset (Maamouri et al, 2004) and evaluated on the WikiNews test set (Abdelali et al, 2016). In building the system we explored two approaches using Support Vector Machines (SVM) and Bidirectional Long Short-Term Memory (bi-LSTM).…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…This work presents two open source state-of-theart POS tagging systems that are trained using standard ATB dataset (Maamouri et al, 2004) and evaluated on the WikiNews test set (Abdelali et al, 2016). In building the system we explored two approaches using Support Vector Machines (SVM) and Bidirectional Long Short-Term Memory (bi-LSTM).…”
Section: Resultsmentioning
confidence: 99%
“…Stem templates may conclusively have one POS tag (e.g., yCCC is always a V) or favor one tag over another (e.g., CCAC is more likely a NOUN than an ADJ). We used Farasa to determine the stem template (Abdelali et al, 2016). …”
Section: Tagging Cliticsmentioning
confidence: 99%
See 2 more Smart Citations
“…• The prefix wa "and" is separated from words by using Farasa toolkit (Abdelali et al, 2016) and all other prefixes: b, f, Al, k, l and s are concatenated to words.…”
Section: Data Normalizationmentioning
confidence: 99%