82nd EAGE Annual Conference &Amp; Exhibition 2021
DOI: 10.3997/2214-4609.202113262
|View full text |Cite
|
Sign up to set email alerts
|

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Abstract: Among the biggest challenges we face in utilizing neural networks trained on waveform (i.e., seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement for accurate labels often forces us to train our networks using synthetic data, where labels are readily available. However, synthetic data often fail to capture the reality of the field/real experiment, and we end up with poor performance of the trained neural networks (NNs) at the inference stage. This is because synthetic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 28 publications
(6 reference statements)
0
12
0
Order By: Relevance
“…The formulas used are specifically designed to replicate the behavior of human metabolism; however, one of the main problems in these tools is that, as they respond to mathematical functions, they do not accurately represent real samples. This problem is commonly referenced as the reality gap problem [11], which complicates the use of artificial data for deep learning.…”
Section: State-of-the-artmentioning
confidence: 99%
“…The formulas used are specifically designed to replicate the behavior of human metabolism; however, one of the main problems in these tools is that, as they respond to mathematical functions, they do not accurately represent real samples. This problem is commonly referenced as the reality gap problem [11], which complicates the use of artificial data for deep learning.…”
Section: State-of-the-artmentioning
confidence: 99%
“…The generation of fake data is fast gaining popularity due to many reasons such as the need for precise labels for deep learning models (Alkhalifah et al 2021;Hoffmann et al 2019) or fears of identity disclosure by data holders . The need for synthetic datasets became more prominent during the SARS-Cov-2 pandemic since the novel infection translated to a shortage of datasets to train medical AI models (Emam et al 2021;Bautista and Inventado 2021).…”
Section: Literature Surveymentioning
confidence: 99%
“…In this study we focus on a passive seismic dataset previously analysed by Wang H. et al (2021). Three different datasets utilised in this study as illustrated in Figure 4.…”
Section: Training Data Generationmentioning
confidence: 99%
“…One major challenge of such DL procedures is that they are trained in a supervised manner and therefore require pairs of noisy-clean data samples for training-an often unobtainable requirement in seismology. Whilst some studies have investigated the use of synthetic datasets for network training, this introduces uncertainty when applying the network to field data due to the large difference between field and synthetic seismic data (Alkhalifah et al, 2021). In an attempt to reduce this difference, many exert a great deal of effort in generating "realistic" synthetic datasets, which often require costly waveform and noise modelling.…”
Section: Introductionmentioning
confidence: 99%