Automatic GPU memory management for large neural models in TensorFlow

Le, Tung D.; Imai, Hideki; Negishi, Yasushi; Kawachiya, Kiyokuni

doi:10.1145/3315573.3329984

Cited by 11 publications

(5 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This can prevent the model from escaping suboptimal regions in the loss landscape. Another issue worth considering is that a smaller batch size may impede the training process, primarily on hardware optimized for large batch sizes, and a large batch size will cause out-of-memory errors on systems that have low GPU memory [81,82]. By carefully considering these parameters, we developed an efficient and accurate model for spider identification.…”

Section: Discussionmentioning

confidence: 99%

SpiderID_APP: A User-Friendly APP for Spider Identification in Taiwan Using YOLO-Based Deep Learning Models

Luong,

Farhan,

Vasquez

et al. 2023

Inventions

View full text Add to dashboard Cite

Accurate and rapid taxonomy identification is the initial step in spider image recognition. More than 50,000 spider species are estimated to exist worldwide; however, their identification is still challenging due to the morphological similarity in their physical structures. Deep learning is a known modern technique in computer science, biomedical science, and bioinformatics. With the help of deep learning, new opportunities are available to reveal advanced taxonomic methods. In this study, we applied a deep-learning-based approach using the YOLOv7 framework to provide an efficient and user-friendly identification tool for spider species found in Taiwan called Spider Identification APP (SpiderID_APP). The YOLOv7 model is integrated as a fully connected neural network. The training of the model was performed on 24,000 images retrieved from the freely available annotated database iNaturalist. We provided 120 genus classifications for Taiwan spider species, and the results exhibited accuracy on par with iNaturalist. Furthermore, the presented SpiderID_APP is time- and cost-effective, and researchers and citizen scientists can use this APP as an initial entry point to perform spider identification in Taiwan. However, for detailed species identification at the species level, additional methods like DNA barcoding or genitalic structure dissection are still considered necessary.

show abstract

Section: Discussionmentioning

confidence: 99%

SpiderID_APP: A User-Friendly APP for Spider Identification in Taiwan Using YOLO-Based Deep Learning Models

Luong,

Farhan,

Vasquez

et al. 2023

Inventions

View full text Add to dashboard Cite

show abstract

“…It is limited to the interaction between the processor and memory leaving aside accelerator devices such as the GPU. In order to solve or significantly reduce this problem, methods have emerged such as the optimization of the directed asynchronous graph created at the time of the execution of the model, as proposed by Le [15] and Boemer [3] or, methods of working with sparses matrices [20] to reduce memory consumption during training processes. They have also considered changing the way, memory levels are used during training as mentioned in Rhu [21] and Lim [17] works but they also present bottlenecks due to the high level of communication that occurs through the PCIe bus.…”

Section: Discussionmentioning

confidence: 99%

“…All of the above have forced accelerator designers for deep neural networks to use highcost memory solutions such as HBM (High Bandwidth Memory) used in Google TPUs [11]. Other solutions have been proposed to overcome these limitations such as the development of new techniques to improve training by working directly on the neural network graph [3,15] or working with sparse matrices [20]. Designing specialized dense nodes for the effective use of accelerators has also been done.…”

Section: Related Workmentioning

confidence: 99%

Computational Resource Consumption in Convolutional Neural Network Training – A Focus on Memory

Barrio-Parra

Barrios

Denneulin

2021

JSFI

View full text Add to dashboard Cite

Deep neural networks (DNNs) have grown in popularity in recent years thanks to the increase in computing power and the size and relevance of data sets. This has made it possible to build more complex models and include more areas of research and application. At the same time, the amount of data generated during the training process of these models puts great pressure on the capacity and bandwidth of the memory subsystem and, as a direct consequence, has become one of the biggest bottlenecks for the scalability of neural networks. Therefore, the optimizing of the workloads produced by DNNs in the memory subsystem requires a detailed understanding of access to the memory and the interactions between the processor, accelerator devices, and the system memory hierarchy. However, contrary to what would be expected, most DNN profilers work at a high level, so they only perform an analysis of the model and individual layers of the network leaving aside the complex interactions between all the hardware components involved in the training. This article shows the characterization performed using a convolutional neural network implemented in the two most popular frameworks: TensorFlow and Pytorch. Likewise, the behavior of the component interactions is discussed by varying the batch size for two sets of synthetic data and showing the results obtained by the profiler created for the study. Moreover, the results obtained when evaluating the AlexNet version on TensorFlow and its similarity in behavior when using a basic CNN are included.

show abstract

“…This results into a √ n memory usage of a neural network model with n nodes. Another strategy is via GPU memory management [43], where inactive tensors are automatically transferred from GPUs to the host and vice versa. This is transparent to users and the added performance penalty is often tolerable.…”

Section: Summary and Discussionmentioning

confidence: 99%

Distributed Training and Optimization Of Neural Networks

Vlimant¹,

Yin²

2020

Preprint

View full text Add to dashboard Cite

Deep learning models are yielding increasingly better performances thanks to multiple factors. To be successful, model may have large number of parameters or complex architectures and be trained on large dataset. This leads to large requirements on computing resource and turn around time, even more so when hyper-parameter optimization is done (e.g search over model architectures). While this is a challenge that goes beyond particle physics, we review the various ways to do the necessary computations in parallel, and put it in the context of high energy physics.

show abstract

Automatic GPU memory management for large neural models in TensorFlow

Cited by 11 publications

References 4 publications

SpiderID_APP: A User-Friendly APP for Spider Identification in Taiwan Using YOLO-Based Deep Learning Models

SpiderID_APP: A User-Friendly APP for Spider Identification in Taiwan Using YOLO-Based Deep Learning Models

Computational Resource Consumption in Convolutional Neural Network Training – A Focus on Memory

Distributed Training and Optimization Of Neural Networks

Contact Info

Product

Resources

About