2023
DOI: 10.1093/bioinformatics/btad153
|View full text |Cite
|
Sign up to set email alerts
|

Foldcomp: a library and format for compressing and indexing large protein structure sets

Abstract: Summary Highly accurate protein structure predictors have generated hundreds of millions of protein structures; these pose a challenge in terms of storage and processing. Here we present Foldcomp, a novel lossy structure compression algorithm and indexing system to address this challenge. By using a combination of internal and cartesian coordinates and a bi-directional NeRF-based strategy, Foldcomp improves the compression ratio by a factor of 3 compared to the next best method. Its reconstru… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…All AF structures analyzed here are available at [48]. PDB files were compressed using Foldcomp [49]. "…”
mentioning
confidence: 99%
“…All AF structures analyzed here are available at [48]. PDB files were compressed using Foldcomp [49]. "…”
mentioning
confidence: 99%
“…By computationally scanning more than 214 million entries in AlphaFold DB version 4 (Kim et al, 2023 ; Varadi et al, 2024 ), we extracted 15,977 single‐chained structures possessing multiple P‐loops. We then analyzed the hydrogen‐bond network and extracted 839 structures with multiple P‐loops on a single continuous β‐sheet (Frishman & Argos, 1995 ).…”
Section: Resultsmentioning
confidence: 99%
“…Ultimately, we hope that this study points the way for future image-centric (or more generally structure-aware) compression of protein structures. Indeed, the contemporaneous Foldcomp [22] makes use of internal bond angles and torsions in protein compression, which is a different means of exploiting the 3D structure than our image-centric approach, and that shows greater promise even than PIC.…”
Section: Discussionmentioning
confidence: 99%
“…The coordinates are measured in units of Angstroms Å, where 1µm = 10, 000 Å [19]. Unlike their FASTA counterparts, comparatively less work has been done to create compressors customized for the structural data contained in PDB and mmCIF files, though there have been a number of recent tools/formats like MMTF [20], BCIF [21], and the brand new Foldcomp [22]. We note with especial interest Foldcomp, which introduces a new paradigm for compressing atomic coordinates using local angles, which is a radical shift from what both MMTF and BCIF do.…”
Section: Introductionmentioning
confidence: 99%