2015
DOI: 10.1007/s10032-015-0242-2
|View full text |Cite
|
Sign up to set email alerts
|

Automatic diacritization of Arabic text using recurrent neural networks

Abstract: This paper presents a sequence transcription approach for the automatic diacritization of Arabic text. A recurrent neural network is trained to transcribe undiacritized Arabic text with fully diacritized sentences. We use a deep bidirectional long short-term memory network that builds high-level linguistic abstractions of text and exploits longrange context in both input directions. This approach differs from previous approaches in that no lexical, morphological, or syntactical analysis is performed on the dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(49 citation statements)
references
References 25 publications
0
49
0
Order By: Relevance
“…They include hybridization of rules and dictionary retrievals with morphological analysis, N-grams, Hidden Markov Models, Dynamic Programming and Machine Learning methods [5,15,17,20,23,31,35,[37][38][39]42]. Some Deep Learning models improved by rules [2,3] have been developed as well.…”
Section: Rule-based Approaches the Used Methods Include Cascading Wementioning
confidence: 99%
“…They include hybridization of rules and dictionary retrievals with morphological analysis, N-grams, Hidden Markov Models, Dynamic Programming and Machine Learning methods [5,15,17,20,23,31,35,[37][38][39]42]. Some Deep Learning models improved by rules [2,3] have been developed as well.…”
Section: Rule-based Approaches the Used Methods Include Cascading Wementioning
confidence: 99%
“…And the most current work in the area relies on hybrid approaches that combine rule-based and statistical modules [14]. Also, several systems 3 and tools have been developed for the resolution of the ambiguity for different levels of the analysis related to automatic diacritization for works such as [15][16][17][18][19][20][21][22]. Gal [23] used a HMM based on learning done on totally diacritized texts in his work, which achieved 85% good diacritizationwith some texts belonging to the training corpus.…”
Section: Related Workmentioning
confidence: 99%
“…The Arabic alphabet is the base alphabet used in multiple languages including: Arabic, Persian and Kurdish. The Arabic language has 36 variants (see Figure 1) of the basic 28 letters and eight basic diacritics (see Figure 2) [4].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, manually adding diacritization to clarify the content is time consuming and can only be reliable through linguistics experts specializing in the Arabic language. Thus, the need for an automated diacritization system is eminent [4], [5].…”
Section: Introductionmentioning
confidence: 99%