Prakash S. Raghavendra scite author profile

The increasing deployment of Distributed Generation (DG) technologies introduces power quality challenges to the grid, in particular steady state voltage rise at the connection point for DG units. In most distribution networks, control and monitoring of grid parameters is missing, as well as system security is at risk. Smart grid technologies have the capability to realize the real-time measurements and on-load voltage controls. With the steady implementation of smart grid technologies throughout the existing distribution networks, the online voltage control can be achieved ensuring the power quality and voltage levels within the statutory limits. This study presents a methodology for the estimation of voltage profile in a smart distribution network with DG for the online voltage control, taking into account different line X/R ratios and laterals. This method is based on maximum and minimum voltage estimation by remote terminal units (RTUs) placed only at DG connected bus and at capacitor connected bus. Voltage regulation is carried out based on RTUs estimated values. This work is tested on two radial distribution networks with/without DGs and laterals. Comparative results for voltage magnitudes estimated with different methodology are presented. The reported simulation results show that the method presented is capable of estimating the voltage profile along the distribution network with DGs for the online voltage control, considering different line X/R ratios and laterals.

show abstract

Web Page Prediction by Clustering and Integrated Distance Measure

Poornalatha

Raghavendra

2012

View full text Add to dashboard Cite

Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU

Neelima

Reddy

Raghavendra

2014

View full text Add to dashboard Cite

Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of such applications on GPU is a challenging research issue. Further, optimizing SpMV time on GPU is an important task for iterative applications like jacobi and conjugate gradient. However, there is a need to consider the overheads caused while computing SpMV on GPU. Transforming the input matrix to a desired format and communicating the data from CPU to GPU are non-trivial overheads associated with SpMV computation on GPU. If the chosen format is not suitable for the given input sparse matrix then desired performance improvements cannot be achieved.Motivated by this observation, this paper proposes a method to chose an optimal sparse matrix format, focusing on the applications where CPU to GPU communication time and preprocessing time are nontrivial. The experimental results show that the predicted format by the model matches with that of the actual high performing format when total SpMV time in terms of pre-processing time, CPU to GPU communication time and SpMV computation time on GPU, is taken into account. The model predicts an optimal format for any given input sparse matrix with a very small overhead of prediction within an application. Compared to the format to achieve high performance only on GPU, our approach is more comprehensive and valuable. This paper also proposes to use a communication and pre-processing overhead optimizing sparse matrix format to be used when these overheads are non trivial.

show abstract

Recent trends in software and hardware for GPGPU computing: A comprehensive survey

Neelima

Raghavendra

2010

View full text Add to dashboard Cite

Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU

Neelima

Reddy

Raghavendra

2013

Concurrency and Computation

View full text Add to dashboard Cite

General purpose computation on graphics processing unit (GPU) is rapidly entering into various scientific and engineering fields. Many applications are being ported onto GPUs for better performance. Various optimizations, frameworks, and tools are being developed for effective programming of GPU. As part of communication and computation optimizations for GPUs, this paper proposes and implements an optimization method called as kernel coalesce that further enhances GPU performance and also optimizes CPU to GPU communication time. With kernel coalesce methods, proposed in this paper, the kernel launch overheads are reduced by coalescing the concurrent kernels and data transfers are reduced incase of intermediate data generated and used among kernels. Computation optimization on a device (GPU) is performed by optimizing the number of blocks and threads launched by tuning it to the architecture. Block level kernel coalesce method resulted in prominent performance improvement on a device without the support for concurrent kernels. Thread level kernel coalesce method is better than block level kernel coalesce method when the design of a grid structure (i.e., number of blocks and threads) is not optimal to the device architecture that leads to underutilization of the device resources. Both the methods perform similar when the number of threads per block is approximately the same in different kernels, and the total number of threads across blocks fills the streaming multiprocessor (SM) capacity of the device. Thread multi-clock cycle coalesce method can be chosen if the programmer wants to coalesce more than two concurrent kernels that together or individually exceed the thread capacity of the device. If the kernels have light weight thread computations, multi clock cycle kernel coalesce method gives better performance than thread and block level kernel coalesce methods. If the kernels to be coalesced are a combination of compute intensive and memory intensive kernels, warp interleaving gives higher device occupancy and improves the performance. Multi clock cycle kernel coalesce method for micro-benchmark1 considered in this paper resulted in 10-40% and 80-92% improvement compared with separate kernel launch, without and with shared input and intermediate data among the kernels, respectively, on a Fermi architecture device, that is, GTX 470. A nearest neighbor (NN) kernel from Rodinia benchmark is coalesced to itself using thread level kernel coalesce method and warp interleaving giving 131.9% and 152.3% improvement compared with separate kernel launch and 39.5% and 36.8% improvement compared with block level kernel coalesce method, respectively. 48 B. NEELIMA, G. R. M. REDDY AND P. S. RAGHAVENDRA models such as compute unified device architecture (CUDA), Open specification for Compute Language. The data parallel applications are ported onto GPU, and GPU can give higher performances than CPU for such applications. The resource allocation on GPU is defined at the grid level by the programmer. The programs have usually a g...

show abstract

A GPU Framework for Sparse Matrix Vector Multiplication

Neelima

Reddy

Raghavendra

2014

View full text Add to dashboard Cite

The hardware and software evolutions related to Graphics Processing Units (GPUs), for general purpose computations, have changed the way the parallel programming issues are addressed. Many applications are being ported onto GPU for achieving performance gain. The GPU execution time is continuously optimized by the GPU programmers while optimizing pre-GPU computation overheads attracted the research community in the recent past. While GPU executes the programs given by a CPU, pre-GPU computation overheads does exists and should be optimized for a better usage of GPUs. The GPU framework proposed in this paper improves the overall performance of the application by optimizing pre-GPU computation overheads along with GPU execution time. This paper proposes a sparse matrix format prediction tool to predict an optimal sparse matrix format to be used for a given input matrix by analyzing the input sparse matrix and considering pre-GPU computation overheads. The sparse matrix format predicted by the proposed method is compared against the best performing sparse matrix formats posted in the literature. The proposed model is based on the static data that is available from the input directly and hence the prediction overhead is very small. Compared to GPU specific sparse format prediction, the proposed model is more inclusive and precious in terms of increasing overall application's performance.

show abstract

Online volt/var control in a smart grid with multiple distributed generation systems

Raghavendra

Gaonkar

2016

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.