In this paper, a new data-driven multiscale material modeling method, which we refer to as deep material network, is developed based on mechanistic homogenization theory of representative volume element (RVE) and advanced machine learning techniques. We propose to use a collection of connected mechanistic building blocks with analytical homogenization solutions which avoids the loss of essential physics in generic neural networks, and this concept is demonstrated for 2-dimensional RVE problems and network depth up to 7. Based on linear elastic RVE data from offline direct numerical simulations, the material network can be effectively trained using stochastic gradient descent with backpropagation algorithm, further enhanced by model compression methods. Importantly, the trained network is valid for any local material laws without the need for additional calibration or micromechanics assumption. Its extrapolations to unknown material and loading spaces for a wide range of problems are validated through numerical experiments, including linear elasticity with high contrast of phase properties, nonlinear history-dependent plasticity and finitestrain hyperelasticity under large deformations.By discovering a proper topological representation of RVE with fewer degrees of freedom, this intelligent material model is believed to open new possibilities of high-fidelity efficient concurrent simulations for a largescale heterogeneous structure. It also provides a mechanistic understanding of structure-property relations across material length scales and enables the development of parameterized microstructural database for material design and manufacturing. cost and accuracy. Analytical micromechanics methods [9,10,1,11,12,13] can be regarded as one type of reduced-order models with high efficiency. However, due to a loss of detailed physics in the microscale, they normally lose accuracy or require extensive model calibrations when irregular complex morphologies, nonlinear history-dependent properties or large deformations are presented. For heterogeneous hyperelastic materials, manifold-learning methods like isomap are used for nonlinear dimensionality reduction of microscopic strain fields [14]. The model reduction of history-dependent plastic materials can be more complex and challenging. Two examples are non-uniform transformation field analysis (NTFA) [15,16] and variants of the principle component analysis [17] or proper orthogonal decomposition (POD) [18,19,20]. However, they usually require extensive a priori simulations for interpolating nonlinear responses, and their extrapolation capability for new material inputs is usually limited, Recently, the self-consistent clustering analysis (SCA) [21,22] has demonstrated a powerful trade-off between accuracy and efficiency in predicting smallstrain elasto-plastic behavior though clustering techniques, and it only requires linear elastic simulations in the offline stage.Meanwhile, current advanced machine learning models (e.g. artificial neural networks and deep learning) hav...