Convolutiona neural network (CNN) is one of the best neural networks for classification, segmentation, natural language processing (NLP), and video processing. The CNN consists of multiple layers or structural parameters. The architecture of CNN can be divided into three sections: convolution layers, pooling layers, and fully connected layers. The application of CNN became most demanding due to its ability to learn features from images automatically, involving massive amount of training data and high computational resources like GPUs. Due to the availability of the above-stated resources, multiple CNN architectures have been reported. This study focuses on the working of convolution, pooling, and the fully connected layers of CNN architecture, origin of architectures, limitation, benefits of reported architectures, and comparative analysis of contemporary architecture concerning the number of parameters, architectural depth, and significant contribution.