The use of computer vision algorithms for real-time accident detection is a highly researched method to minimize delays in post-crash care. Various methods are adopted in the literature; However, based on our understanding, these methods are not sufficiently reliable in accurately detecting accidents, and there is a high occurrence of false accident detections. This paper aims to achieve precise accident detection while minimizing false positive detections. Our approach involves a four-phase framework that integrates vehicle detection and continuous tracking techniques, specifically utilizing the You Only Look Once (YOLO) and ByteTrack algorithms. Subsequently, we have developed a criterion for identifying abrupt change involving checking for overlap between vehicles and angle of collision, a potential accident indicator. During our third phase, we precisely determine the location of the accident within the sudden alteration frames. The fourth pivotal phase stands out, leveraging Vision Transformer (ViT), an encoder only model, to carefully eliminate fake accidents. Our methodology surpasses the typical use of Convolutional Neural Network (CNN)-based approaches by demonstrating a comprehensive integration of several deep learning techniques. The framework was evaluated on Real-World Surveillance videos with diverse conditions; The performance of proposed framework was found effective, outperforming the existing works.