2022
DOI: 10.1101/2022.01.20.477098
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Image-centric compression of protein structures improves space savings

Abstract: Motivation: Because of the rapid generation of data, the study of compression algorithms to reduce storage and transmission costs is important to bioinformaticians. Much of the focus has been on sequence data, including both genomes and protein amino acid sequences stored in FASTA files. However, there are few specialized compressors for structural protein data contained in PDB and mmCIF files. Current standard practice is to use an ordinary lossless compressors such as gZip on a sequential list of atomic coor… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 34 publications
(57 reference statements)
0
3
0
Order By: Relevance
“…Various strategies ( Valasatava et al 2017 ) have been proposed to deal with the growth of protein structure databases, including general-purpose compressors like Gzip and data-record-specific encodings like BinaryCIF ( Sehnal et al 2020 ) and MMTF ( Bradley et al 2017 ). PIC ( Staniscia and Yu 2022) transforms 3D coordinates into a lossy 2D image-like format and applies the PNG-image compression algorithm. Specialized formats for molecular trajectories ( Roe and Brooks 2022 ) have also been developed to compress different states of a same molecule.…”
Section: Introductionmentioning
confidence: 99%
“…Various strategies ( Valasatava et al 2017 ) have been proposed to deal with the growth of protein structure databases, including general-purpose compressors like Gzip and data-record-specific encodings like BinaryCIF ( Sehnal et al 2020 ) and MMTF ( Bradley et al 2017 ). PIC ( Staniscia and Yu 2022) transforms 3D coordinates into a lossy 2D image-like format and applies the PNG-image compression algorithm. Specialized formats for molecular trajectories ( Roe and Brooks 2022 ) have also been developed to compress different states of a same molecule.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to lossless compression, some implementations of MMTF also enable lossy compression by retaining only one digit after decimal. On the other hand, the PIC ( 7 ) format only performs lossy compression with a slight loss of precision (usually ∼0.1 Å) by applying the Portable Network Graphics (PNG) compression algorithm for atom positions in the spherical coordinate space. Although PNG is a lossless compression algorithm, the coordinate conversion from Cartesian to spherical space introduces rounding error effects, which makes PIC compression lossy.…”
Section: Introductionmentioning
confidence: 99%
“…In the lossy mode, PDC stores approximate coordinate differences between neighboring Cα atoms as well as torsion and side chain angles. Kim et al (2023) evaluated several other approaches to compress protein structures including PULCHRA [8], MMTF [3], PIC [13]. Though, BinaryCIF, Foldcomp, and PDC can be considered as the current state-of-the-art.…”
Section: Introductionmentioning
confidence: 99%