The dictionary-based compression methods described in Chapter 3 of the book are different, but have one thing in common; they generate the dictionary as they go along, reading data and compressing it. The dictionary is not included in the compressed file and is generated by the decoder in lockstep with the encoder. Thus, such methods can be termed "online." In contrast, the methods described here are also dictionary based, but can be considered "offline" because they include the dictionary in the compressed file. The first method is byte pair encoding (BPE). This is a simple compression method, due to [Gage 94], that often features only mediocre performance. It is described here because (1) it is an example of a multipass method (two-pass compression algorithms are common, but multipasses are normally considered too slow) and (2) it eliminates only certain types of redundancy and should therefore be applied only to data files that feature this redundancy. (The second method, by [Larsson and Moffat 00], does not suffer from these restrictions and is much more efficient.) BPE is both an example of an offline dictionary-based compression algorithm and a simple example (perhaps the simplest) of a grammar-based compression method. In addition, the BPE decoder is very small, which makes it an ideal candidate for applications where memory size is restricted. The BPE method is easy to understand. We assume that the data symbols are bytes and we use the term bigram for a pair of consecutive bytes. Each pass locates the mostcommon bigram and replaces it with an unused byte value. Thus, the method performs best on files that have many unused byte values, and one aim of this document is to show what types of data feature this kind of redundancy. First, however, a small example. Given the character set A, B, C, D, X, and Y and the data file ABABCABCD (where X and Y are unused bytes), the first pass identifies the pair AB as the most-common bigram and replaces each of its three occurrences with the single byte X. The result is XXCXCD. The second pass identifies the pair XC as the most-common bigram and replaces each of its two occurrences with the single byte Y. The result is XYYD, where every bigram occurs just once. Bigrams that occur just once can also be replaced, if more unused byte values are available. However, each replacement rule must be appended to the dictionary and thus ends up being included in the compressed file. As a result, the BPE encoder stops when no bigram occurs more than once. What types of data tend to have many unused byte values? The first type that comes to mind is text. Currently, most text files use the well-known ASCII codes to encode text. An ASCII code occupies a byte, but only seven bits constitue the actual code. The eighth bit can be used for a parity check, but is often simply set to zero. Thus, we can expect 128 byte values out of the 256 possible ones to by unused in a typical ASCII text file. A quick glance at an ASCII code table shows that codes 0 through 32 (as well as code 127) are co...
The dictionary-based compression methods described in Chapter 3 of the book are different, but have one thing in common; they generate the dictionary as they go along, reading data and compressing it. The dictionary is not included in the compressed file and is generated by the decoder in lockstep with the encoder. Thus, such methods can be termed "online." In contrast, the methods described here are also dictionary based, but can be considered "offline" because they include the dictionary in the compressed file.The first method is byte pair encoding (BPE). This is a simple compression method, due to [Gage 94], that often features only mediocre performance. It is described here because (1) it is an example of a multipass method (two-pass compression algorithms are common, but multipasses are normally considered too slow) and (2) it eliminates only certain types of redundancy and should therefore be applied only to data files that feature this redundancy. (The second method, by [Larsson and Moffat 00], does not suffer from these restrictions and is much more efficient.) BPE is both an example of an offline dictionary-based compression algorithm and a simple example (perhaps the simplest) of a grammar-based compression method. In addition, the BPE decoder is very small, which makes it an ideal candidate for applications where memory size is restricted.The BPE method is easy to understand. We assume that the data symbols are bytes and we use the term bigram for a pair of consecutive bytes. Each pass locates the mostcommon bigram and replaces it with an unused byte value. Thus, the method performs best on files that have many unused byte values, and one aim of this document is to show what types of data feature this kind of redundancy. First, however, a small example. Given the character set A, B, C, D, X, and Y and the data file ABABCABCD (where X and Y are unused bytes), the first pass identifies the pair AB as the most-common bigram and replaces each of its three occurrences with the single byte X. The result is XXCXCD. The second pass identifies the pair XC as the most-common bigram and replaces each of its two occurrences with the single byte Y. The result is XYYD, where every bigram occurs just once. Bigrams that occur just once can also be replaced, if more unused byte values are available. However, each replacement rule must be appended to the dictionary and thus ends up being included in the compressed file. As a result, the BPE encoder stops when no bigram occurs more than once.What types of data tend to have many unused byte values? The first type that comes to mind is text. Currently, most text files use the well-known ASCII codes to encode text. An ASCII code occupies a byte, but only seven bits constitue the actual code. The eighth bit can be used for a parity check, but is often simply set to zero. Thus, we can expect 128 byte values out of the 256 possible ones to by unused in a typical ASCII text file. A quick glance at an ASCII code table shows that codes 0 through 32 (as well as code 127) are cont...
The aim of this study was to explore how acid deposition may affect the concentration and quality of dissolved organic matter (DOM) in soil-water. This was done by a small-scale acidification experiment during two years where 0.5 × 0.5 m(2) plots were artificially irrigated with water with different sulfuric acid content, and soil-water was sampled using zero-tension lysimeters under the O-horizon. The DOM was characterized using absorbance, fluorescence, and size exclusion chromatography analyses. Our results showed lower mobility of DOM in the high acid treatment. At the same time, there was a significant change in the DOM quality. Soil-water in the high acid treatment exhibited DOM that was less colored, less hydrophobic, less aromatic, and of lower molecular weight, compared to the low acid treatment. This supports the hypothesis that reduction in sulfur deposition is an important driver behind the ongoing brownification of surface waters in many regions.
We assessed how zooplankton (copepods) handle the simultaneous threats of predators and ultraviolet (UV) radiation and whether they respond with changes in pigmentation, vertical migration, or both. We found weak vertical migration among copepods in response to UV stress, and this response was not apparently influenced by predation risk. Exposure to high levels of UV radiation caused copepods to retain pigments in the absence of a predation threat. When exposed to predation threat, they reduced their pigmentation regardless of UV level. Thus, they ranked predation as a threat more severe than UV radiation. Reducing the protective pigment level in response to predation in a situation in which UV radiation is high may, however, lead to higher mortality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.