Data reduction for X-ray serial crystallography using machine learning

Rahmani, Vahid; Nawaz, Shah; Pennicard, David; Setty, Shabarish Pala Ramakantha; Graafsma, H.

doi:10.1107/s1600576722011748

Cited by 9 publications

(19 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DeepFreak (GLCM + random forest) (Souza et al, 2019) 98.4 DeepFreak (GLCM + support vector machine) (Souza et al, 2019) 97.6 ORB + MLP (Rahmani et al, 2023) 97.5 DeepFreak (Souza et al, 2019) 98.8 AlexNet (our implementation) 98.1 ResNet-101 (our implementation) 98.3 which are useful for downstream tasks such as indexing. On the other hand, deep learning methods reliably differentiate between hit and miss classes.…”

Section: Methods Accuracymentioning

confidence: 99%

“…Previous work involving machine learning has used both synthetic and experimental data sets (Ke et al, 2018;Souza et al, 2019;Rahmani et al, 2023). Thus, we selected data sets to visualize CNN representations along with the parts respon-sible for a certain prediction.…”

Section: Data Setsmentioning

confidence: 99%

“…With these advancements, the crystallography community has also made use of machine learning for various applications (Sullivan et al, 2019;Park et al, 2017;Ryan et al, 2018;Wang et al, 2020). Specifically, the serial crystallography community has experimented with these methods to achieve data reduction (Becker & Streit, 2014;Ke et al, 2018;Souza et al, 2019;Rahmani et al, 2023;Chen et al, 2021). Machine learning, or more specifically deep learning methods including convolutional neural networks (CNNs), encode experimental data to classify it into hit or miss categories.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Explainable machine learning for diffraction patterns

Nawaz,

Rahmani,

Pennicard

et al. 2023

J Appl Cryst

View full text Add to dashboard Cite

Serial crystallography experiments at X-ray free-electron laser facilities produce massive amounts of data but only a fraction of these data are useful for downstream analysis. Thus, it is essential to differentiate between acceptable and unacceptable data, generally known as `hit' and `miss', respectively. Image classification methods from artificial intelligence, or more specifically convolutional neural networks (CNNs), classify the data into hit and miss categories in order to achieve data reduction. The quantitative performance established in previous work indicates that CNNs successfully classify serial crystallography data into desired categories [Ke, Brewster, Yu, Ushizima, Yang & Sauter (2018). J. Synchrotron Rad. 25, 655–670], but no qualitative evidence on the internal workings of these networks has been provided. For example, there are no visualization methods that highlight the features contributing to a specific prediction while classifying data in serial crystallography experiments. Therefore, existing deep learning methods, including CNNs classifying serial crystallography data, are like a `black box'. To this end, presented here is a qualitative study to unpack the internal workings of CNNs with the aim of visualizing information in the fundamental blocks of a standard network with serial crystallography data. The region(s) or part(s) of an image that mostly contribute to a hit or miss prediction are visualized.

show abstract

Section: Methods Accuracymentioning

confidence: 99%

Section: Data Setsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Explainable machine learning for diffraction patterns

Nawaz,

Rahmani,

Pennicard

et al. 2023

J Appl Cryst

View full text Add to dashboard Cite

show abstract

“…refinement, but with the addition of new AI programs (Ke et al, 2018;Rahmani et al, 2023) to screen for diffraction, and our present body of work that uses AI to characterize diffraction, we are edging away from requiring user interactions for serial data processing.…”

Section: Figure 14mentioning

confidence: 99%

“…These 'misses' (which comprise significant percentages of data collected using high-flow-rate injector methods) could then be excluded from processing and/or recording to disk to free up computing resources. More recently, in Rahmani et al (2023) various dimensionalityreduction algorithms have been used to convert diffraction data into a set of features suitable for training a machinelearning classifier to automatically detect whether experimental images contained diffraction.…”

Section: Introductionmentioning

confidence: 99%

Deep residual networks for crystallography trained on synthetic data

Mendez,

Holton,

Lyubimov

et al. 2024

Acta Cryst Sect D Struct Biol

View full text Add to dashboard Cite

The use of artificial intelligence to process diffraction images is challenged by the need to assemble large and precisely designed training data sets. To address this, a codebase called Resonet was developed for synthesizing diffraction data and training residual neural networks on these data. Here, two per-pattern capabilities of Resonet are demonstrated: (i) interpretation of crystal resolution and (ii) identification of overlapping lattices. Resonet was tested across a compilation of diffraction images from synchrotron experiments and X-ray free-electron laser experiments. Crucially, these models readily execute on graphics processing units and can thus significantly outperform conventional algorithms. While Resonet is currently utilized to provide real-time feedback for macromolecular crystallography users at the Stanford Synchrotron Radiation Lightsource, its simple Python-based interface makes it easy to embed in other processing frameworks. This work highlights the utility of physics-based simulation for training deep neural networks and lays the groundwork for the development of additional models to enhance diffraction collection and analysis.

show abstract