2023
DOI: 10.1107/s1600576722011748
|View full text |Cite
|
Sign up to set email alerts
|

Data reduction for X-ray serial crystallography using machine learning

Abstract: Serial crystallography experiments produce massive amounts of experimental data. Yet in spite of these large-scale data sets, only a small percentage of the data are useful for downstream analysis. Thus, it is essential to differentiate reliably between acceptable data (hits) and unacceptable data (misses). To this end, a novel pipeline is proposed to categorize the data, which extracts features from the images, summarizes these features with the `bag of visual words' method and then classifies the images usin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(19 citation statements)
references
References 34 publications
2
17
0
Order By: Relevance
“…DeepFreak (GLCM + random forest) (Souza et al, 2019) 98.4 DeepFreak (GLCM + support vector machine) (Souza et al, 2019) 97.6 ORB + MLP (Rahmani et al, 2023) 97.5 DeepFreak (Souza et al, 2019) 98.8 AlexNet (our implementation) 98.1 ResNet-101 (our implementation) 98.3 which are useful for downstream tasks such as indexing. On the other hand, deep learning methods reliably differentiate between hit and miss classes.…”
Section: Methods Accuracymentioning
confidence: 99%
See 2 more Smart Citations
“…DeepFreak (GLCM + random forest) (Souza et al, 2019) 98.4 DeepFreak (GLCM + support vector machine) (Souza et al, 2019) 97.6 ORB + MLP (Rahmani et al, 2023) 97.5 DeepFreak (Souza et al, 2019) 98.8 AlexNet (our implementation) 98.1 ResNet-101 (our implementation) 98.3 which are useful for downstream tasks such as indexing. On the other hand, deep learning methods reliably differentiate between hit and miss classes.…”
Section: Methods Accuracymentioning
confidence: 99%
“…Previous work involving machine learning has used both synthetic and experimental data sets (Ke et al, 2018;Souza et al, 2019;Rahmani et al, 2023). Thus, we selected data sets to visualize CNN representations along with the parts respon-sible for a certain prediction.…”
Section: Data Setsmentioning
confidence: 99%
See 1 more Smart Citation
“…refinement, but with the addition of new AI programs (Ke et al, 2018;Rahmani et al, 2023) to screen for diffraction, and our present body of work that uses AI to characterize diffraction, we are edging away from requiring user interactions for serial data processing.…”
Section: Figure 14mentioning
confidence: 99%
“…These 'misses' (which comprise significant percentages of data collected using high-flow-rate injector methods) could then be excluded from processing and/or recording to disk to free up computing resources. More recently, in Rahmani et al (2023) various dimensionalityreduction algorithms have been used to convert diffraction data into a set of features suitable for training a machinelearning classifier to automatically detect whether experimental images contained diffraction.…”
Section: Introductionmentioning
confidence: 99%