This special issue of Concurrency and Computation Practice and Experience gathers eleven selected research articles that were previously presented at the Brazilian "XVII Simpósio em Sistemas Computacionais de Alto Desempenho," WSCAD 2016, held in conjunction with 28th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2015, Florianópolis, SC, Brazil, from the 19th to the 21st October 2015. Since 2000, this workshop has presented important and interesting research in the fields of computer architectures, high performance computing, and distributed systems.The scope of the current special issue is broad and representative of the multidisciplinary nature of high performance and distributed computing, covering a wide range of subjects such as architecture issues, compiler optimization, analysis of HPC applications, job scheduling, and energy efficiency.The title of the first paper is "An efficient virtual system clock for the wireless Raspberry Pi computer platform," by Diego L. C. Dutra, Edilson C. Corrêa, and Claudio L. Amorim [1]. In this paper, the authors present the design and experimental evaluation of an implementation of the RVEC virtual system clock in the Linux kernel for the EE (Energy-Efficient) Wireless Raspberry Pi (RasPi) platform. In the RasPi platform, the use of DVFS (Dynamic Voltage and Frequency) for reducing the energy consumption hinders the direct use of the cycle count of the ARM11 processor core for building an efficient system clock. Therefore, a distinct feature of RVEC is to obviate this obstacle, such that it can make use of the cycle count circuit for precise and accurate time measurements, concurrently with the use of DVFS by the operating system of the ARM11 processor core.In the second contribution, entitled "Portability with efficiency of the advection of BRAMS between multi-core and many-core architectures," the authors, Manoel Baptista Silva Junior, Jairo Panetta, and Stephan Stephany [2], show the feasibility of writing a single portable code embedding both interfaces (the OpenMP programming interface and OpenACC). It presents acceptable efficiency when executed on nodes with multi-core or many-core architecture. The code chosen as a case study is the advection of scalars, a part of the dynamics of the regional atmospheric model Brazilian Regional Atmospheric Modeling System (BRAMS). The dynamics of this model is hard to parallelize due to data dependencies between adjacent grid points. Single-node executions of the advections of scalars for different grid sizes using OpenMP or OpenACC yielded similar speed-ups, showing the feasibility of the proposed approach.In the third contribution, entitled "SMT-based context-bounded model checking for CUDA programs," the authors (Phillipe Pereira, Higo Albuquerque, Isabela da Silva, Hendrio Marques, Felipe Monteiro, Ricardo Ferreira, and Lucas Cordeiro) [3] present the ESBMC-GPU tool, an extension to the Efficient SMT-Based Context-Bounded Model Checker (ESBMC), which is aimed at verifying Graphics ...