Millimeter-wave (mm-wave) systems rely on narrowbeams to cope with the severe signal attenuation in the mmwave frequency band. However, susceptibility to beam misalignment due to mobility or blockage requires the use of beamalignment schemes, with huge cost in terms of overhead and use of system resources. In this paper, a beam-alignment scheme is proposed based on Bayesian multi-armed bandits, with the goal to maximize the alignment probability and the data-communication throughput. A Bayesian approach is proposed, by considering the state as a posterior distribution over angles of arrival (AoA) and of departure (AoD), given the history of feedback signaling and of beam pairs scanned by the base-station (BS) and the userend (UE). A simplified sufficient statistic for optimal control is identified, in the form of preference of BS-UE beam pairs. By bounding a value function, the second-best preference policy is formulated, which strikes an optimal balance between exploration and exploitation by selecting the beam pair with the current second-best preference. Through Monte-Carlo simulation with analog beamforming, the superior performance of the secondbest preference policy is demonstrated in comparison to existing schemes based on first-best preference, linear Thompson sampling, and upper confidence bounds, with up to 7%, 10% and 30% improvements in alignment probability, respectively.