2016 8th International Conference on Wireless Communications &Amp; Signal Processing (WCSP) 2016
DOI: 10.1109/wcsp.2016.7752638
|View full text |Cite
|
Sign up to set email alerts
|

A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU

Abstract: In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of the trellis is exploited to reduce the metric computation. Based on the compute unified device architecture (CUDA), two kernels with different parallelism are designed to map two decoding phases. Moreover, the optimal design of data structures for several kinds of intermedia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…In [10], in addition to the tiling scheme and coalescing accesses of survivor paths, branch metrics are efficiently computed according to specific repetitive patterns which help to share computations. In addition, data transfers between CPU and GPU are optimized, in specific, by employing multiple CUDA streams, and by compacting every four input llr values as a 32-bit value, and every 32 output decoded bits as a 32-bit value.…”
Section: Previous Gpu-accelerated Viterbi Decoder Methodsmentioning
confidence: 99%
“…In [10], in addition to the tiling scheme and coalescing accesses of survivor paths, branch metrics are efficiently computed according to specific repetitive patterns which help to share computations. In addition, data transfers between CPU and GPU are optimized, in specific, by employing multiple CUDA streams, and by compacting every four input llr values as a 32-bit value, and every 32 output decoded bits as a 32-bit value.…”
Section: Previous Gpu-accelerated Viterbi Decoder Methodsmentioning
confidence: 99%
“…In [10], in addition to the tiling scheme and coalescing accesses of survivor paths, branch metrics are efficiently computed according to specific repetitive patterns which help to share computations. In addition, data transfers between CPU and GPU are optimized, in specific, by employing multiple CUDA streams.…”
Section: Previous Gpu-accelerated Viterbi Decoder Methodsmentioning
confidence: 99%
“…1) applies the signal at the antenna to a pass-band filter, a low-noise amplifier (LNA), minimizing the noise's statistical power. Next, a coherent demodulation section using a multi-phase voltage-controlled oscillator (MP-VCO) removes the cosine in (1). The MP-VCO [21] mathematical behavior is:…”
Section: How the Recovery Loop Workmentioning
confidence: 99%
“…The hard Viterbi, in an additive white Gaussian noise (AWGN) channel model and no inter-symbolic interference (ISI), decodes convolutional coded symbols with a digital circuit made of a simple add-compare-select (ACS) and trace-back units. The survivors-path and the related output symbols, with a decision depth sufficient to the survivors' convergence in a unique state, [1] require two arrays as storage: typically a random access memory (RAM). The classic Viterbi decoder stores the state and branch metrics in additional RAMs.…”
Section: Introductionmentioning
confidence: 99%