<div>Optimal queueing control of multi-hop networks remains a challenging problem, e</div><div>specially in two-way relaying systems, even in the most straightforward scenarios.</div><div>In this paper, we explore two-way relaying having a full-duplex decode-and-forward</div><div>relay with two fifinite buffers. Principally, we propose a novel concept based on the</div><div>multi-agent reinforcement learning (that maximizes the cumulative network through</div><div>put) based on the combination of the buffer states and the lossy links; a decision is</div><div>generated as to whether it can transmit, receive or even simultaneously receive and</div><div>transmit information. Towards this objective, chieflfly, based on the queue state transi</div><div>tion and the lossy links, an analytic Markov decision process is proposed to analyze</div><div>this scheme, and the throughput and queueing delay are derived. Our numerical results</div><div>reveal exciting insights. First, artifificial intelligence based on reinforcement learning</div><div>is optimal when the length of the buffer is superior to a certain threshold. Second, we</div><div>demonstrate that reinforcement learning can boost transmission effificiency and prevent</div><div>buffer overflflow.</div>