An appealing option for bettering traffic safety and efficiency, Vehicular Ad-hoc Networks (VANETs) may provide situational awareness even when potential dangers and traffic anomalies are out of sight. It is possible to avoid broadcast conflicts and improve communication efficiency in VANETs using Diagonal-Intersection-Routing Protocol based on Reinforcement Learning (DIRP-RL). DIRP-RL uses the RL algorithm to learn the traffic statistics for individual road segments, combining the benefits of Diagonal-Intersection-Routing with static road map information. DIRP-RL use to mitigate delivery delays, improve the packet transmission rate, and find an efficient route and the impact of quick vehicle movements on route sensitivity, off-road packet forwarding is built around distributed R2R RL (RL occurs between roadside units). RL geographical forwarding uses SCF (Store-Carry-Forward) to mitigate packet loss if a local optimum is reached. Its efficiency is compared to TDRL-RP, ARPRL, DRL-ML, and QGrid. The result in terms of packet delivery rate will be achieved by 98%, end-to-end latency will be reduced by 5%, hop count will be achieved by 95%, and routing overhead will be reduced by 4% and performance will be improved by 96%.