This paper presents a machine learning strategy that tackles a distributed optimization task in a wireless network with an arbitrary number of randomly interconnected nodes. Individual nodes decide their optimal states with distributed coordination among other nodes through randomly varying backhaul links. This poses a technical challenge in distributed universal optimization policy robust to a random topology of the wireless network, which has not been properly addressed by conventional deep neural networks (DNNs) with rigid structural configurations. We develop a flexible DNN formalism termed distributed message-passing neural network (DMPNN) with forward and backward computations independent of the network topology. A key enabler of this approach is an iterative message-sharing strategy through arbitrarily connected backhaul links. The DMPNN provides a convergent solution for for collecting inputs from local nodes and evaluating a valid DNN output. By contrast, a learning to cooperate formalism [20], [22] has been recently introduced to tackle the distributed network management in [12]- [14]. The underlying policy is to decouple a node operation into two component DNN units: a message generator and a distributed optimizer. The message generator encodes locally available information into messages. The messages are subsequently transferred to nearby nodes over network links. The distributed optimizer combines incoming messages to determine the optimal state of the corresponding node. These units are trained offline in a centralized domain, while their inference is conducted on the fly in a decentralized manner.Distributed wireless systems, e.g., internet-of-things (IoT) and wireless sensor networks, entail network configurations, such as network topology and node population, that are given arbitrarily and change gradually. However, existing distributed DL methods [12]-[14] fail to grasp these random features since either the DNNs become ineligible to accommodate all possible candidates of networking setup for the limit of the capacity or their computations readily become prohibitively demanding. This gives rise to the necessity of a universal DL framework applicable to arbitrary network configurations.We consider a distributed optimization problem over a random network. The distributed coordination allows nodes to share messages through backhaul links. Individual nodes produce the optimal solution based on messages along with local information. In reality, direct interactions among all nodes are not possible due to the absence of the backhaul links. Thus, the network model is inherently an undirected graph with randomly connected edges. With a graphical network model, an efficient DL computation structure of distributed optimizations is investigated. A single node aiming at distributed message-passing (DMP) inference is constructed with message generation, message reception, state update, and distributed decision. Each node generates a message dedicated to an adjacent node that is connected by a backhaul link. S...