2011
DOI: 10.1134/s1995423911010058
|View full text |Cite
|
Sign up to set email alerts
|

Implementation of algorithms with a fine-grained parallelism on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 4 publications
0
8
0
Order By: Relevance
“…Such accelerations are attainable for such models and algorithms that are inherently parallel such is the case with our cellular nonlinear network model. Normally such processors have a number M of kernels that are of the order of tens to hundreds and will offer an acceleration of the computation speed that is often smaller than M. Cellular computing models in CUDA technologies were considered and speedups of about 0.3-0.5M times are often reported [9]. We did our own experiments with a Nvidia GPU Geforce 8800GT (addressing a 512 Mbyte memory).…”
Section: B Cuda/gpu Accelerationmentioning
confidence: 99%
“…Such accelerations are attainable for such models and algorithms that are inherently parallel such is the case with our cellular nonlinear network model. Normally such processors have a number M of kernels that are of the order of tens to hundreds and will offer an acceleration of the computation speed that is often smaller than M. Cellular computing models in CUDA technologies were considered and speedups of about 0.3-0.5M times are often reported [9]. We did our own experiments with a Nvidia GPU Geforce 8800GT (addressing a 512 Mbyte memory).…”
Section: B Cuda/gpu Accelerationmentioning
confidence: 99%
“…Various solutions for implementing cellular automata on such platforms are recently presented in the literature (e.g. [14][16]), making it easily to adjust them to the model (5) which has many similarities to any continuous state cellular automaton. Figure 4 presents the dynamics of both "u" and "v" layers for T=200 and for some face image cropped from one of the faces in the database [17].…”
Section: B the Discrete-time Rd-cnn As Image Processormentioning
confidence: 99%
“…Several analogue chips were already proposed in the literature [10] [11][12] [13], although we consider that from a practical point of view, digital models would be better. Speeding-up such models using GPU/CUDA approaches [14] are also convenient solutions in terms of good ratios between performances and costs. In this paper we propose the development of a RD-CNN processor suitable for digital implementations (PC, GPU cards, FPGAs, etc.)…”
Section: Introductionmentioning
confidence: 99%
“…All cores have access to the global memory (off-chip, slow) used to exchange data between threads and between GPU and the host CPU. While the implementation presented here uses global memory, further sophisticated optimization exploiting the speed of accessing shared memory can be done resulting in even better performance [4].…”
Section: B Hardware and Optimizationmentioning
confidence: 99%