Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2009 Data Compression Conference 2009
DOI: 10.1109/dcc.2009.22
|View full text |Cite
|
Sign up to set email alerts
|

On the Use of Word Alignments to Enhance Bitext Compression

Abstract: The amount of information that is stored in digital form in more than one language is growing very fast as a consequence of the globalization. Furthermore, there are countries and supra-national entities whose legislation enforces the translation (and storage) of all the official texts into all their official languages.Two texts that are mutual translations are usually referred to as a bilingual parallel corpus or, in short, as a bitext. Compressing independently the two texts of a bitext is far form efficient… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…Our strategy, called Two-Level Compressor for Aligned Bitexts (2LCAB), is based on two main ideas: (i) the use of biwords [9], pairs of aligned words, as the basis of the model, that is, as the symbols to compress, and (ii) the use of a two level structure for the representation of the vocabularies, where the vocabulary of biwords, at the second level, is represented in compressed form using the vocabularies of the first level. Figure 2 shows a conceptual description of this scheme.…”
Section: Two-level Compressor For Aligned Bitexts (2lcab)mentioning
confidence: 99%
“…Our strategy, called Two-Level Compressor for Aligned Bitexts (2LCAB), is based on two main ideas: (i) the use of biwords [9], pairs of aligned words, as the basis of the model, that is, as the symbols to compress, and (ii) the use of a two level structure for the representation of the vocabularies, where the vocabulary of biwords, at the second level, is represented in compressed form using the vocabularies of the first level. Figure 2 shows a conceptual description of this scheme.…”
Section: Two-level Compressor For Aligned Bitexts (2lcab)mentioning
confidence: 99%
“…In contrast to PPM, some text-compression methods use words rather than characters as input tokens (Moffat, 1989;Moffat & Isal, 2005). Analogously, Martínez-Prieto, Adiego, Sánchez-Martínez, de la Fuente, and Carrasco (2009), and his colleagues (2009, 2010) propose the use of biwords -pairs of words, each one from a different text, with a high Figure 1: Processing pipeline of a biword-based bitext compression approach.…”
Section: Introductionmentioning
confidence: 99%
“…The Hutter Prize [12], a competition to compress a 100 m-word extract of English Wikipedia, was designed to futher encourage research in text compression. Bilingual and multilingual text compression is a less-studied field [1,[13][14][15][16][17][18]. These papers provide different algorithms for compressing text in multilingual format, but they do not demonstrate how humans perform on this task.…”
Section: Related Workmentioning
confidence: 99%