Delay tolerant networks (DTNs) refer to a kind of novel wireless mobile network, where there is no constant end‐to‐end connection between network nodes due to frequent movement, sparse distribution and limited communication range of nodes. Instead of the traditional store‐forward routing strategy, in DTNs, the new store‐carry‐forward routing strategy is adopted for data transmission. Therefore, how to select the best next‐hop node among network nodes is the main challenge of the routing in DTNs. To this end, here, a k‐step double Q‐learning routing (K‐DQLR) algorithm is proposed, which integrates the multi‐step and double Q‐learning algorithms to make an unbiased, accurate and efficient routing decision in DTNs. Besides, a new dynamic reward mechanism is proposed, which combines the number of routing hops and the node centrality to adopt the dynamic network environment of DTNs. The simulation results show that K‐DQLR can significantly increase the delivery ratio while reducing the delivery delay and overhead compared with the related state‐of‐the‐art routing protocols of DTNs.