2016
DOI: 10.25080/majora-629e541a-007
|View full text |Cite
|
Sign up to set email alerts
|

datreant: persistent, Pythonic trees for heterogeneous data

Abstract: In science the filesystem often serves as a de facto database, with directory trees being the zeroth-order scientific data structure. But it can be tedious and error prone to work directly with the filesystem to retrieve and store heterogeneous datasets. datreant makes working with directory structures and files Pythonic with Treants: specially marked directories with distinguishing characteristics that can be discovered, queried, and filtered. Treants can be manipulated individually and in aggregate, with mec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 6 publications
0
11
0
Order By: Relevance
“…With the exception of the compressed FASTQ files, all data (VCFs, images, MIC metadata, genetic variants and catalogue predictions) were aggregated and stored in a hierarchical file system using the Python datreant 1.0.2 module (55) which allowed for data discovery, tagging and filtering. Updates were performed by inhouse Python scripts.…”
Section: Methodsmentioning
confidence: 99%
“…With the exception of the compressed FASTQ files, all data (VCFs, images, MIC metadata, genetic variants and catalogue predictions) were aggregated and stored in a hierarchical file system using the Python datreant 1.0.2 module (55) which allowed for data discovery, tagging and filtering. Updates were performed by inhouse Python scripts.…”
Section: Methodsmentioning
confidence: 99%
“…In addition five further values of ΔΔ G tmp for the F99Y mutation were calculated using simulations two orders of magnitude longer (25 ns). To cope with the resulting very large numbers of molecular dynamics simulations, all data were stored in a file hierarchy and tagged using the datreant Python module [ 36 ]. All simulation data were then parsed and alchemical free energies were calculated as a function of simulation time t using a purpose-written Python class.…”
Section: Methodsmentioning
confidence: 99%
“…In order to support generic file-based workflows, the signac data model makes minimal assumptions about how these workflows generate and operate on the data; signac manages the file paths, but the underlying files are stored directly on the file system without modification or serialization. This design ensures that existing tools may interact with a signac data repository without the need to serialize or convert existing file formats, an advantage shared by solutions like datreant [17]. Conversely, this design distinguishes signac from more domain-specific solutions that make certain assumptions about data schema and format, such as DCMS [18] and the AiiDA infrastructure [19].…”
Section: Workflowmentioning
confidence: 99%