Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization-a challenging setup that tests models on translation directions they have not been optimized for at training time.To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zeroshot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreementbased learning often results in 2-3 BLEU zeroshot improvement over strong baselines without any loss in performance on supervised translation directions. Philip Koehn. 2017. Europarl: A parallel corpus for statistical machine translation. Philipp Koehn. 2009. Statistical machine translation. Cambridge University Press. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043.