Data Compression Using Adaptive Coding and Partial String Matching

Cleary, John G.; Witten, Ian H.

doi:10.1109/tcom.1984.1096090

Cited by 977 publications

(584 citation statements)

References 14 publications

Supporting

Mentioning

575

Contrasting

Unclassified

Order By: Relevance

“…In [14] text compression methods are considered for its extension to bitext compression considering exact correspondences between two words, and synonymy relationships between the words in both texts (as given by a thesaurus). These parallel predictions are then combined with PPM [3] ones. The weighting of both models are carefully tuned improving PPM compression ratios on separate texts.…”

Section: Compression Of Bitextsmentioning

confidence: 99%

“…-a Spanish-Catalan (es-ca) bitext from El Periódico de Catalunya, 2 a daily newspaper published both in Catalan and Spanish; -a Spanish-Galician (es-gl) bitext from Diario Oficial de Galicia, 3 the bulletin of the Government of Galicia, published both in Galician and Spanish; and -bitexts for German-English (de-en), Spanish-English (es-en) and French-English (fr-en) from the European Parliament Proceedings Parallel Corpus [8].…”

Section: Searching All the Possible Translations Of A Wordmentioning

confidence: 99%

See 1 more Smart Citation

A Two-Level Structure for Compressing Aligned Bitexts

Adiego

Brisaboa

Martínez‐Prieto

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering because they are used as source of knowledge for different purposes. In this paper we propose a strategy to efficiently compress and use bitexts, saving, not only space, but also processing time when exploiting them. Our strategy is based on a two-level structure for the vocabularies, and on the use of biwords, a pair of associated words, one from each language, as basic symbols to be encoded with an ETDC [2] compressor. The resulting compressed bitext needs around 20% of the space and allows more efficient implementations of the different types of searches and operations that linguistic engineerings need to perform on them. In this paper we discuss and provide results for compression, decompression, different types of searches, and bilingual snippets extraction.

show abstract

Section: Compression Of Bitextsmentioning

confidence: 99%

Section: Searching All the Possible Translations Of A Wordmentioning

confidence: 99%

A Two-Level Structure for Compressing Aligned Bitexts

Adiego

Brisaboa

Martínez‐Prieto

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Each user took dictation from Jane Austen's Emma in five-minute sessions. The language model (PPMD5) predicts the next character given the previous five characters; 6,7 it was trained on passages from Emma not included in the dictation. Right panels, the two experts took dictation using the same eyetracker to control the WiViK on-screen keyboard (a standard qwerty keyboard) with the word-completion buttons enabled.…”

Section: Commentioning

confidence: 99%

Fast hands-free writing by gaze direction

Ward

Mackay

2002

Nature

206

150

View full text Add to dashboard Cite

“…The leading data representation method for compression purposes is Huffman coding, which forms the basis of most subsequent approaches [20]. Compression techniques can be broadly classified in four major categories: derivatives of Lempel-Ziv-Welch [23], approaches based on statistical model prediction [4], on characters permutations [1], and on arithmetic coding [16]. In the case of DEM data, the seminal work [12] advocated the possibility of reducing data size with an initial data simplification stage, followed by compression with Huffman coding.…”

Section: Related Workmentioning

confidence: 99%

“…Prediction by Partial Matching (PPM) [4] algorithms consider the correlation between values (which, for example, could be linearly growing). They use N past values to predict the next one, trying to find the best relationship.…”

Section: Currently There Are Two Classes Of Algorithms That Perform mentioning

confidence: 99%

Compressing Web Geodata for Real-Time Environmental Applications

Cavallaro

Fedorov

Bernaschina

et al. 2016

Collective Online Platforms for Financial and Environmental Awareness

View full text Add to dashboard Cite

Abstract. The advent of connected mobile devices has caused an unprecedented availability of geo-referenced user-generated content, which can be exploited for environment monitoring. In particular, Augmented Reality (AR) mobile applications can be designed to enable citizens collect observations, by overlaying relevant meta-data on their current view. This class of applications rely on multiple meta-data, which must be properly compressed for transmission and real-time usage. This paper presents a two-stage approach for the compression of Digital Elevation Model (DEM) data and geographic entities for a mountain environment monitoring mobile AR application. The proposed method is generic and could be applied to other types of geographical data.

show abstract

Data Compression Using Adaptive Coding and Partial String Matching

Cited by 977 publications

References 14 publications

A Two-Level Structure for Compressing Aligned Bitexts

A Two-Level Structure for Compressing Aligned Bitexts

Fast hands-free writing by gaze direction

Compressing Web Geodata for Real-Time Environmental Applications

Contact Info

Product

Resources

About