We consider the problem of identifying the source of an epidemic, spreading through a network, from a complete observation of the infected nodes in a snapshot of the network. Previous work on the problem has often employed geometric, spectral or heuristic approaches to identify the source, with the trees being the most studied network topology. We take a fully statistical approach and derive novel recursions to compute the Bayes optimal solution, under a susceptible-infected (SI) epidemic model. Our analysis is time and rate independent, and holds for general network topologies. We then provide two tractable algorithms for solving these recursions, a mean-field approximation and a greedy approach, and evaluate their performance on real and synthetic networks. Real networks are far from tree-like and an emphasis will be given to networks with high transitivity, such as social networks and those with communities. We show that on such networks, our approaches significantly outperform geometric and spectral centrality measures, most of which perform no better than random guessing. Both the greedy and mean-field approximation are scalable to large sparse networks.Preprint. Under review. and consistent recovery are restricted to regular infinite trees [23,26], and as we show in this paper, the popular and well-cited methods are quite unreliable in a wide range of real networks.Source identification has remained largely unsolved and poorly understood for real complex networks. As we will show through experiments in Section 5, in real networks, even the optimal Bayes estimator applied to small infected sets has difficulty narrowing down to the true source. It is thus important to recover as much information from the likelihood of the model as possible. We develop techniques for computing the full likelihood of the infection, as opposed to identifying the most likely samplepath [26]. Moreover, we fully exploit the information from the boundary of the infection set, in addition to the structure inside the infected subgraph. This idea has been pointed out before [32], but has been mostly neglected by subsequent work; cf. [29,24]. We develop all these ideas without restricting the structure of the network to trees. Our framework also easily extends to the case where there are multiple infecting sources (Appendix A).In this paper, we develop statistical algorithms that outperform the state-of-the-art in a wide range of network topologies. Our contributions are distinct in several ways:1. Our methods are parameter-free, meaning that they do not require knowing the duration of the epidemic or how fast it grows.