One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processor's tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms.The DNP provides inter-tile services for both on-chip and offchip communications with a uniform RDMA style API, over a multi-dimensional direct network -see [1] for a definition of direct networks -with a (possibly) hybrid topology. It is designed as a parametric Intellectual Property Library easily customizable to specific needs. The currently available blocks implement wormhole, deadlock-free packet-based communications with static routing.The DNP offers a configurable number L, N and M of ports -respectively intra-tile I/O ports to ensure connections among elements within the same computational tile, on-chip communication ones to link different tiles onto the same silicon die, and off-chip communication inter-tile I/O ports to link those belonging to different dies. -Because of the fully switched architecture, the DNP may sustain up to L + N + M packet transactions at the same time.The DNP has been integrated into the design of an MPSoC dedicated to both high performance audio/video processing and theoretical physics applications. We present the details of its architecture and show some promising results we obtained on a first preliminary implementation.