A peak-finding algorithm for serial crystallography (SX) data analysis based on the principle of `robust statistics' has been developed. Methods which are statistically robust are generally more insensitive to any departures from model assumptions and are particularly effective when analysing mixtures of probability distributions. For example, these methods enable the discretization of data into a group comprising inliers (i.e. the background noise) and another group comprising outliers (i.e. Bragg peaks). Our robust statistics algorithm has two key advantages, which are demonstrated through testing using multiple SX data sets. First, it is relatively insensitive to the exact value of the input parameters and hence requires minimal optimization. This is critical for the algorithm to be able to run unsupervised, allowing for automated selection or `vetoing' of SX diffraction data. Secondly, the processing of individual diffraction patterns can be easily parallelized. This means that it can analyse data from multiple detector modules simultaneously, making it ideally suited to real-time data processing. These characteristics mean that the robust peak finder (RPF) algorithm will be particularly beneficial for the new class of MHz X-ray free-electron laser sources, which generate large amounts of data in a short period of time.
A serial millisecond crystallography (SMX) facility has recently been implemented at the macromolecular crystallography beamline, MX2 at the Australian Synchrotron. The setup utilizes a combination of an EIGER X 16M detector system and an in-house developed high-viscosity injector, “Lipidico.” Lipidico uses a syringe needle to extrude the microcrystal-containing viscous media and it is compatible with commercially available syringes. The combination of sample delivery via protein crystals suspended in a viscous mixture and a millisecond frame rate detector enables high-throughput serial crystallography at the Australian Synchrotron. A hit-finding algorithm, based on the principles of “robust-statistics,” is employed to rapidly process the data. Here we present the first SMX experimental results with a detector frame rate of 100 Hz (10 ms exposures) and the Lipidico injector using a mixture of lysozyme microcrystals embedded in high vacuum silicon grease. Details of the experimental setup, sample injector, and data analysis pipeline are designed and developed as part of the Australian Synchrotron SMX instrument and are reviewed here.
The recent development of serial crystallography at synchrotron and X‐ray free‐electron laser (XFEL) sources is producing crystallographic datasets of ever increasing volume. The size of these datasets is such that fast and efficient analysis presents a range of challenges that have to be overcome to enable real‐time data analysis, which is essential for the effective management of XFEL experiments. Among the blocks which constitute the analysis pipeline, one major bottleneck is `peak finding', whose goal is to identify the Bragg peaks within (often) noisy diffraction patterns. Development of faster and more reliable peak‐finding algorithms will allow for efficient processing and storage of the incoming data, as well as the optimal use of diffraction data for structure determination. This paper addresses the problem of peak finding and, by extension, `hit finding' in crystallographic XFEL datasets, by exploiting recent developments in robust statistical analysis. The approach described here involves two basic steps: (1) the identification of pixels which contain potential peaks and (2) modeling of the local background in the vicinity of these potential peaks. The presented framework can be generalized to include both complex background models and alternative models for the Bragg peaks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.