The widespread adoption of wireless communication has led to a rapid increase in the utilization of Multiple-Input-Multiple-Output (MIMO) technology. This advancement enables the simultaneous transmission of multiple data streams through the use of multiple transmitters and receivers. MIMO leverages the radio wave phenomenon known as multipath, where transmitted signals encounter various obstacles, arriving at the antenna at different angles and times. In 5G networks, an inherent challenge in Massive MIMO Localization arises from the non-line-of-sight (NLOS) problem. This issue significantly hampers positioning accuracy, emphasizing the need for innovative solutions. This work proposes an intelligent localization technique based on NLOS identification and mitigation. To achieve this, we first propose a Convolutional Neural network (CNN) based hybrid Archimedes based Salp Swarm Algorithm (HASSA) technique to detect the NLOS and LOS and therein estimate the location accuracy. The accuracy can be analyzed by considering the angle of arrival of signals (AOA), Threshold-based Time of arrival (TOA), Time difference of arrival (TDOA) from different antennas. Henceforth, a novel Reinforcement Learning based optimization approach is used for the mitigation of NLOS in the radio wave propagation path. We use Ensemble Deep Deterministic Policy Gradient-based approach (EDDPG) based Honey Badger algorithm (HBA) for the aforementioned process. This also reduces the computational complexity. Simulation of this approach deems different scenarios and considers different parameters and compared with different state-of-art works. From the simulation results, we observed that our proposed approach can be used for the identification and detection of the LOS and NLOS components and also precisely enhance the localization than the other approaches.