William J. Bolosky scite author profile

Metagenomic next-generation sequencing (mNGS) for pan-pathogen detection has been successfully tested in proof-of-concept case studies in patients with acute illness of unknown etiology but to date has been largely confined to research settings. Here, we developed and validated a clinical mNGS assay for diagnosis of infectious causes of meningitis and encephalitis from cerebrospinal fluid (CSF) in a licensed microbiology laboratory. A customized bioinformatics pipeline, SURPI+, was developed to rapidly analyze mNGS data, generate an automated summary of detected pathogens, and provide a graphical user interface for evaluating and interpreting results. We established quality metrics, threshold values, and limits of detection of 0.2-313 genomic copies or colony forming units per milliliter for each representative organism type. Gross hemolysis and excess host nucleic acid reduced assay sensitivity; however, spiked phages used as internal controls were reliable indicators of sensitivity loss. Diagnostic test accuracy was evaluated by blinded mNGS testing of 95 patient samples, revealing 73% sensitivity and 99% specificity compared to original clinical test results, and 81% positive percent agreement and 99% negative percent agreement after discrepancy analysis. Subsequent mNGS challenge testing of 20 positive CSF samples prospectively collected from a cohort of pediatric patients hospitalized with meningitis, encephalitis, and/or myelitis showed 92% sensitivity and 96% specificity relative to conventional microbiological testing of CSF in identifying the causative pathogen. These results demonstrate the analytic performance of a laboratory-validated mNGS assay for panpathogen detection, to be used clinically for diagnosis of neurological infections from CSF.

show abstract

Reclaiming space from duplicate files in a serverless distributed file system

Douceur

et al.

View full text Add to dashboard Cite

The Farsite distributed file system provides availability by replicating each file onto multiple desktop computers. Since this replication consumes significant storage space, it is important to reclaim used space where possible. Measurement of over 500 desktop file systems shows that nearly half of all consumed space is occupied by duplicate files. We present a mechanism to reclaim space from this incidental duplication to make it available for controlled file replication. Our mechanism includes 1) convergent encryption, which enables duplicate files to coalesced into the space of a single file, even if the files are encrypted with different users' keys, and 2) SALAD, a Self-Arranging, Lossy, Associative Database for aggregating file content and location information in a decentralized, scalable, fault-tolerant manner. Large-scale simulation experiments show that the duplicate-file coalescing system is scalable, highly effective, and fault-tolerant.

show abstract

Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

et al. 2000

View full text Add to dashboard Cite

We consider an architecture for a serverless distributed file system that does not assume mutual trust among the client computers. The system provides security, availability, and reliability by distributing multiple encrypted replicas of each file among the client machines. To assess the feasibility of deploying this system on an existing desktop infrastructure, we measure and analyze a large set of client machines in a commercial environment. In particular, we measure and report results on disk usage and content; file activity; and machine uptimes, lifetimes, and loads. We conclude that the measured desklop infrastructure would passably support our proposed system, providing availability on the order of one unfilled file request per user per thousand days.

show abstract

A five-year study of file-system metadata

et al. 2007

View full text Add to dashboard Cite

For five years, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In this article, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system population, storage capacity and consumption, and degree of file modification. We present a generative model that explains the namespace structure and the distribution of directory sizes. We find significant temporal trends relating to the popularity of certain file types, the origin of file content, the way the namespace is used, and the degree of variation among file systems, as well as more pedestrian changes in size and capacities. We give examples of consequent lessons for designers of file systems and related software.

show abstract

A study of practical deduplication

2012

View full text Add to dashboard Cite

We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.