Linear text segmentation using a dynamic programming algorithm

Kehagias, Athanasios; Fragkou, Pavlina; Petridis, V.

doi:10.3115/1067807.1067831

Cited by 18 publications

(22 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Other researchers have adopted a variety of other approaches, for example: peak finding in a lexical cohesion curve (Hearst, 1997), minimization of an ad hoc segmentation cost function (Kehagias, Pavlina, & Petridis, 2003), converting the text segmentation problem to one of image segmentation then applying techniques from image processing (Ji & Zha, 2003), and using affinity propagation in factor graphs (Kazantseva & Szpakowicz, 2011 The application of statistical and computational methods to problems in authorship analysis has been the focus of much study. Koppel et al (Koppel, Schler, & Argamon, 2009) surveyed this line of work, 19 focused on three specific types of problems, and discussed how machine learning methods can be applied to those problems.…”

Section: Discussionmentioning

confidence: 99%

An improved algorithm for unsupervised decomposition of a multi‐author document

Giannella

2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

Abstract AbstractThis paper addresses the problem of unsupervised decomposition of a multi author text document: identifying the sentences that were written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a non trivial impact on accuracy.Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.

show abstract

Section: Discussionmentioning

confidence: 99%

An improved algorithm for unsupervised decomposition of a multi‐author document

Giannella

2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…En cela, l'ASL permet d'aller au-delà du modèle classique vectoriel (Manning et Schütze, 1999 : 539 sq .) équemment employé pour mesurer la cohésion lexicale entre des phrases (Hearst, 1997 ;Choi, 2000 ;Kehagias, Pavlina L'ASL n'est pas la seule technique proposée pour répondre à ces problèmes (voir, par exemple, Morris et Hirst, 1991 ;Kozima, 1993 ;Ferret, 2002).…”

unclassified

“…et Petridis, 2003). Dans le modèle vectoriel, la similarité entre deux phrases est basée uniquement sur les mots communs.…”

unclassified

Évaluation automatique de textes et cohésion lexicale

Bestgen¹

2012

discours

View full text Add to dashboard Cite

L’évaluation automatique de textes connaît actuellement un succès grandissant en raison de son importance dans le champ de l’éducation et, tout particulièrement, de l’apprentissage des langues étrangères. Si des systèmes efficaces ont été développés ces quinze dernières années, peu d’entre eux prennent en compte le niveau discursif. Récemment, quelques recherches ont proposé de remédier à cette lacune au moyen de mesures automatiques de la cohésion lexicale obtenues à partir d’une analyse sémantique latente, mais les résultats n’ont pas été conformes aux attentes. En s’inspirant d’un modèle bien connu de l’expertise rédactionnelle, la présente recherche propose d’employer un nouvel indice de cohésion dérivé des travaux en segmentation thématique de textes. L’efficacité de cet index est confirmée au travers de l’analyse d’un corpus de 223 textes d’apprenants de l’anglais comme langue étrangère. La conclusion discute les limitations principales de cette étude exploratoire et propose des pistes de développement.
Automatic essay grading is currently experiencing a growing popularity because of its importance in the field of education and, particularly, in foreign language learning. While several efficient systems have been developed over the last fifteen years, almost none of them take the discourse level into account. Recently, a few studies proposed to fill this gap by means of automatic indexes of lexical cohesion obtained from Latent Semantic Analysis, but the results were disappointing. Based on a well-known model of writing expertise, the present study proposes a new index of cohesion derived from work on the thematic segmentation of texts. The efficiency of this index is supported through the analysis of a corpus of 223 essays of learners of English as a foreign language. The conclusion discusses the main limitations of this exploratory study and proposes further avenues for development

show abstract

“…An extensive discussion of precisely the same problem addressed here, but with a different approach to its solution, is in [3], [4]. Work by Hubert [10], [11], with applications to meteorology, influenced Kehagias and co-workers [8], [15], [16], [17], [18], [19], who developed a dynamic programming algorithm much like ours, for applications such as text segmentation (see also [9]), where the raw data are provided in the form of a similarity matrix. [22] gives an O(kN 2 ) dynamic programming algorithm for finding the optimal partition of an interval into k blocks, for a given k. See also [20], [2] for related work.…”

Section: Introduction: the Problemmentioning

confidence: 99%

An algorithm for optimal partitioning of data on an interval

Jackson

Scargle

Barnes

et al. 2005

IEEE Signal Process. Lett.

331

307

View full text Add to dashboard Cite

Abstract-Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large space of partitions of N data points in time O(N 2 ). The algorithm is guaranteed to find the exact global optimum, automatically determines the model order (the number of segments), has a convenient real-time mode, can be extended to higher dimensional data spaces, and solves a surprising variety of problems in signal detection and characterization, density estimation, cluster analysis and classification.

show abstract

Linear text segmentation using a dynamic programming algorithm

Cited by 18 publications

References 16 publications

An improved algorithm for unsupervised decomposition of a multi‐author document

An improved algorithm for unsupervised decomposition of a multi‐author document

Évaluation automatique de textes et cohésion lexicale

An algorithm for optimal partitioning of data on an interval

Contact Info

Product

Resources

About