2006
DOI: 10.1007/11893318_13
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets

Abstract: Text classification is an important and challenging research domain. In this paper, identifying historical period and ethnic origin of documents using stylistic feature sets is investigated. The application domain is Jewish Law articles written in Hebrew-Aramaic. Such documents present various interesting problems for stylistic classification. Firstly, these documents include words from both languages. Secondly, Hebrew and Aramaic are richer than English in their morphology forms. The classification is done us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2010
2010
2020
2020

Publication Types

Select...
2
1
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…The research described in this article is clearly developed and expanded beyond the conference paper presented by us in HaCohen‐Kerner et al (2006), as follows: (a) The background and the related researches in various subdomains were enlarged significantly; (b) new feature sets (name‐based feature sets) have been defined and applied; (c) additional classification experiments were applied to also identify places where the responsa were written; (d) instead of applying the five‐fold cross‐validation, we apply the 10‐fold cross‐validation. The 10‐fold cross‐validation tends to give a less biased estimate of true generalization error.…”
Section: The Modelmentioning
confidence: 80%
See 2 more Smart Citations
“…The research described in this article is clearly developed and expanded beyond the conference paper presented by us in HaCohen‐Kerner et al (2006), as follows: (a) The background and the related researches in various subdomains were enlarged significantly; (b) new feature sets (name‐based feature sets) have been defined and applied; (c) additional classification experiments were applied to also identify places where the responsa were written; (d) instead of applying the five‐fold cross‐validation, we apply the 10‐fold cross‐validation. The 10‐fold cross‐validation tends to give a less biased estimate of true generalization error.…”
Section: The Modelmentioning
confidence: 80%
“…In this research, we presented CUISINE, which extends and improves the system presented in HaCohen‐Kerner et al (2006) by: (a) new feature sets (name‐based feature sets); (b) new applications such as places where the responsa were written and a total of seven different experiments instead of three, and (c) better results: CUISINE achieved accuracy results of 98.99, 94.08, and 90.71% for the following classification experiments: ethnicity, ethnicity&time, and ethnicity&time&place, respectively, compared to those achieved by the previous system: 98.67 and 92.81% for ethnicity and ethnicity&time, respectively.…”
Section: Discussionmentioning
confidence: 94%
See 1 more Smart Citation
“…Other studies that are related to document classification and address the challenges of Hebrew involve the classification of Hebrew-Aramaic documents according to style (Koppel et al, 2006;Mughaz, 2003); authorship verification, including forgeries and pseudonyms (Koppel et al, 2003(Koppel et al, , 2004 and classification of texts according to their ethnic origin and their historical period (HaCohen-Kerner, Beck, Yehudai & Mughaz, 2006;HaCohen-Kerner, Mughaz et al, 2008;HaCohen-Kerner, Beck, Yehudai, Rosenstein & Mughaz, 2010).…”
Section: Related Workmentioning
confidence: 99%
“…Another works that are related to document classification and address the challenges of Hebrew involve the classification of Hebrew-Aramaic documents according to style (Koppel, Mughaz, & Akiva, 2006;Mughaz, 2003); authorship verification, including forgers and pseudonyms (Koppel, Mughaz, & Akiva, 2003;Koppel, Schler, & Mughaz, 2004); and classification of texts according to their ethnic origin and their historical period (HaCohen-Kerner, Beck, Yehudai & Mughaz, 2006;…”
Section: Related Workmentioning
confidence: 99%