Improving SIMD Parallelism via Dynamic Binary Translation

Hong, Ding-Yong; Liu, Yuping; Fu, Sheng-Yu; Wu, Jan‐Jan; Hsu, Wei-Chung

doi:10.1145/3173456

Cited by 12 publications

(10 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The translation effect is better, and the results are consistent with the experimental expectation. Hong et al (2018) converted the short S.I.M.D. command into a discontinuous phrase and translated it, which greatly improved its speed [26].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Application of Data Mining Methods in Internet of Things Technology for the Translation Systems in Traditional Ethnic Books

Luo

Xiang

2020

IEEE Access

View full text Add to dashboard Cite

In order to translate the ethnic classics, based on the research on the Internet of things, machine learning, and translation technology of ethnic classics, the log-linear model is combined with the national corpus scale and the grammatical structure characteristics, and the phrase statistical machine translation is used to establish a discontinuous phrase extraction model. Then, the translation technology is studied from the three aspects of model definition, training, and decoding. Finally, the algorithm is compared with the traditional phrase extraction algorithm to verify its effectiveness. The results show that the extraction number of discontinuous phrase extraction model is significantly higher than that of traditional phrase extraction model, and the model can extract more phrases, handle larger and more complex text, and score higher in translation fluency. From the evaluation indexes scores of Bilingual Evaluation Understudy (B.L.E.U.) and National Institute of Standards and Technology (N.I.S.T.), it can be found that the B.L.E.U. and N.I.S.T. values of the traditional phrase extraction algorithm are lower than those of the discontinuous phrase extraction model algorithm. The discontinuous phrase extraction algorithm can not only extract the regular continuous phrase, but also obtain the discontinuous text, and the translation effect is better. In conclusion, the combination of Internet of things and machine learning can be used in the translation of ethnic classics to achieve high-quality translation of discontinuous phrases, which is of guiding significance for the study of machine translation.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Hong et al (2018) converted the short S.I.M.D. command into a discontinuous phrase and translated it, which greatly improved its speed [26]. Miura et al (2016) proposed a method to remember key discontinuous phrases in triangulation stage, and used key language model as additional information source in the transformation stage.…”

Section: Discussionmentioning

confidence: 99%

Application of Data Mining Methods in Internet of Things Technology for the Translation Systems in Traditional Ethnic Books

Luo

Xiang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Dynamic rewriting of SIMD instructions has been proposed in [5,16] to find SIMD mappings between host and guest architectures in dynamic binary translation. The Dynamic Binary Translation system proposed in [7] details a technique to widen SIMD instructions during this mapping. It only targets loops and relies on recovery of LLVM IR from the binary, which is imprecise and can lead to spurious dependencies.…”

Section: Hand-written Kernel (Avx2)mentioning

confidence: 99%

“…It only targets loops and relies on recovery of LLVM IR from the binary, which is imprecise and can lead to spurious dependencies. Compared to [7], Revec is implemented as a compiler level transformation pass and inherently has access to loop structures without the need to recover them from the binary, making Revec more precise. Further, Revec applies to loop-free segments of code, making it more general than [7] -the Simd NeuralConvert benchmark that Revec greatly accelerates depends on this capability.…”

Section: Hand-written Kernel (Avx2)mentioning

confidence: 99%

Revec: program rejuvenation through revectorization

Mendis

Jain

et al. 2019

Proceedings of the 28th International Conference on Compiler Construction

View full text Add to dashboard Cite

Modern microprocessors are equipped with Single Instruction Multiple Data (SIMD) or vector instructions which expose data level parallelism at a fine granularity. Programmers exploit this parallelism by using low-level vector intrinsics in their code. However, once programs are written using vector intrinsics of a specific instruction set, the code becomes non-portable. Modern compilers are unable to analyze and retarget the code to newer vector instruction sets. Hence, programmers have to manually rewrite the same code using vector intrinsics of a newer generation to exploit higher data widths and capabilities of new instruction sets. This process is tedious, error-prone and requires maintaining multiple code bases. We propose Revec, a compiler optimization pass which revectorizes already vectorized code, by retargeting it to use vector instructions of newer generations. The transformation is transparent, happening at the compiler intermediate representation level, and enables performance portability of hand-vectorized code.Revec can achieve performance improvements in real-world performance critical kernels. In particular, Revec achieves geometric mean speedups of 1.160× and 1.430× on fast integer unpacking kernels, and speedups of 1.145× and 1.195× on hand-vectorized x265 media codec kernels when retargeting their SSE-series implementations to use AVX2 and AVX-512 vector instructions respectively. We also extensively test Revec's impact on 216 intrinsic-rich implementations of image processing and stencil kernels relative to hand-retargeting. CCS CONCEPTS• Computer systems organization → Single instruction, multiple data; • Software and its engineering → Compilers; Software performance.

show abstract

“…In our work we rewrite binaries dynamically at runtime, hence allowing for optimizing long running applications, without the need for restart. Some techniques require uplifting binary into intermediate representation (IR) to perform analysis and preparations for parallelization (and possibly optimization itself) statically [10], [11], [12]. While this simplifies the transformation process, it adds extra time overhead for uplifting process.…”

Section: Binary Parallelizationmentioning

confidence: 99%

Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

Yusuf

El-Mahdy

Rohou

2019

IEEE Lett. of the Comput. Soc.

View full text Add to dashboard Cite

With the multicore trend, the need for automatic parallelization is more pronounced, especially for legacy and proprietary code where no source code is available and/or the code is already running and restarting is not an option. In this paper, we engineer a mechanism for transforming at runtime a frequent for-loop with no data dependencies in a binary program into a parallel loop, using on-stack replacement. With our mechanism, there is no need for source code, debugging information or restarting the program. Also, the mechanism needs no static instrumentation or information. The mechanism is implemented using the Padrone binary modification system and pthreads, where the remaining iterations of the loop are executed in parallel. The mechanism keeps the running program state by extracting the targeted loop into a separate function and copying the current stack frame into the corresponding frames of the created threads. Initial study is conducted on a set of kernels from the Polybench workload. Experiments results show from 2× to 3.5× speedup from sequential to parallelized code on four cores, which is similar to source code level parallelization.

show abstract

Improving SIMD Parallelism via Dynamic Binary Translation

Cited by 12 publications

References 35 publications

Application of Data Mining Methods in Internet of Things Technology for the Translation Systems in Traditional Ethnic Books

Application of Data Mining Methods in Internet of Things Technology for the Translation Systems in Traditional Ethnic Books

Revec: program rejuvenation through revectorization

Runtime On-Stack Parallelization of Dependence-Free For-Loops in Binary Programs

Contact Info

Product

Resources

About