<p>By bringing content close to end-users, proactive caching plays a vital role in improving the user experience in wireless networks. Caching content at the network edge proactively has been particularly effective in fast-changing vehicular networks. The objective of this paper is to address the proactive caching problem at the next roadside unit (RSU) in vehicular networks using reinforcement learning techniques. The paper proposes two proactive caching algorithms based on <em>multi-armed bandit</em> (MAB) learning, namely <em>non-contextual MAB-based</em> (MAB) and <em>contextual MAB-based</em> (cMAB). Additionally, the paper also investigates the uncertainty associated with proactive caching systems in the form of entropy with a specifically extended <em>Subjective Logic</em> framework, providing an insight into the underlying link between prediction accuracy and uncertainty. Two cities: Las Vegas, USA with grid road layout and Manchester, UK with more complex and historical layout, are considered in the simulation. Results have shown the generality of the proposed schemes in cities with different road layouts. Performance of the two proposed MAB-based systems is compared with two non-contextual baseline system: <em>Equal Probability-based</em> and <em>Probability-based</em>, and one contextual baseline system named <em>Compact Prediction Tree+ based</em>. Both proposed systems outperformed their counterparts. In terms of the prediction accuracy, cMAB has reached 75% and 80\% accuracy in Las Vegas and Manchester respectively, and MAB reaches over 50% in both testing cities. Regarding the benefits to the vehicular network, cMAB and MAB perform similarly in both cities irrespective of the road layout. Particularly, the paper shows that on average 75% and 81% content fragments are proactively served with cMAB and over 50% with MAB in Las Vegas and Manchester, which is consistent with the prediction accuracy associated with the schemes.</p>