Currently, the computational complexity limits the training of high resolution gigapixel images using Convolutional Neural Networks. Therefore, such images are divided into patches or tiles. Since, these high resolution patches are encoded with discriminative information therefore; CNNs are trained on these patches to perform patch-level predictions. It is a most sought-after approach for solving gigapixel classification and segmentation problems. Patch level predictions could also be used to detect new regions in the biopsy samples taken from the same patient at the same site. In other words, once the patient slide has been trained with the model, biopsy samples taken at a later stage to find whether the tumour has spread could be tested using the trained model. However, the problem with patch-level prediction is that pathologist generally annotates at image-level and not at patch level. Therefore, certain patches within the image, since not annotated finely by the experts, may not contain enough class-relevant features. Moreover, standard CNN generate a high dimensional feature set per image, if we train CNN just by patch level, it may pick up features specific to the training set and hence, not generalizable. Through this work, we tried to incorporate patch descriptive capability within the deep framework by using Bag of Visual Words (BoVW) as a kind of regularisation to improve generalizability. BoVW-a traditional handcrafted feature descriptor is known to show good descriptive capability along with feature interpretability. Using this hypothesis, we aim to build a patch based classifier to discriminate between four classes of breast biopsy image patches (normal, benign, In situ carcinoma, invasive carcinoma) in respect to better patch discriminative features embedded within the classification pipeline. The aim is to incorporate quality deep features using CNN to describe relevant information in the images while simultaneously discarding irrelevant information using Bag of Visual Words (BoVW). The proposed method passes patches obtained from WSI and microscopy images through pre-trained CNN to extract features. BoVW is used as a feature selector to select most discriminative features among the CNN features. Finally, the selected feature sets are classified as one of the four classes. The hybrid model provides flexibility in terms of choice of pre-trained models for feature extraction. Moreover, the pipeline is end-to-end since it does not require post processing of patch predictions to select discriminative patches. We compared our observations with state-of-the-art methods like ResNet50, DenseNet169, and Incep-