Proceedings of the 2005 ACM Symposium on Document Engineering 2005
DOI: 10.1145/1096601.1096637
|View full text |Cite
|
Sign up to set email alerts
|

Injecting information into atomic units of text

Abstract: This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common data structure, together with an arbitrary number of user-defined properties. In the first part, we give a survey of the notions of character and glyph and their relation with Natural Language Processing models, some visual text representation issues and strategies adopted by file formats (SVG, PDF, DVI) and software (Uniscribe, Pango… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0
4

Year Published

2006
2006
2023
2023

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 1 publication
0
5
0
4
Order By: Relevance
“…The computer has brought a more abstract layer to it, by storing and transmitting textual data. The atomic unit of this abstract representation of text, as defined in the Unicode standard [8], is called a character. And indeed, characters prove to be useful for obtaining alternative (nonvisual) representations of text such as Braille, speech synthesis, etc.…”
Section: Fonts Characters and Glyphsmentioning
confidence: 99%
See 2 more Smart Citations
“…The computer has brought a more abstract layer to it, by storing and transmitting textual data. The atomic unit of this abstract representation of text, as defined in the Unicode standard [8], is called a character. And indeed, characters prove to be useful for obtaining alternative (nonvisual) representations of text such as Braille, speech synthesis, etc.…”
Section: Fonts Characters and Glyphsmentioning
confidence: 99%
“…And indeed, characters prove to be useful for obtaining alternative (nonvisual) representations of text such as Braille, speech synthesis, etc. The visual representation of a character is called a glyph [8]. Displaying textual contents, whether on screen or on paper, involves translating characters into glyphs, a non-trivial operation for many writing systems.…”
Section: Fonts Characters and Glyphsmentioning
confidence: 99%
See 1 more Smart Citation
“…Notre but est de trouver une meilleure solution aux problèmes évoqués dans la section précédente -liés aux faiblesses du modèle d'Unicode et à la relation entre fontes intelligentes et paragraphage -mais aussi de mieux exploiter les informations fournies par les fontes intelligentes. Nous allons procéder par l'intermédiaire de trois nouveaux concepts : 1) le textème, introduit par les auteurs dans les articles (Haralambous et al, 2005a) et (Haralambous et al, 2005b) (sous le nom de signe dans ce dernier) ;…”
Section: Un Nouveau Modèle De Texte Et Une Nouvelle Approche Au Parag...unclassified
“…Some examples of these applications are text filtering, text retrieval, information extraction, information statistic, content rewriting, automatic generation of text, etc. However, a major drawback of the existing text processing solutions was that they develop the different processing programs to satisfy the corresponding text processing requirements [1][2][3].…”
Section: Introductionmentioning
confidence: 99%