Future networks are expected to use Artificial Intelligence and machine learning applications intensely. The primary focus of machine learning techniques will be on intelligent automation and decision‐making based on information learned from network functions, resources, and users. The research community has studied several schemes for using machine learning in communications and networks in the last two decades. However, the schemes mainly focus on learning the state information of individual network functions and resources and performing their optimization in a standalone fashion. Recently, a study group from the International Telecommunication Union has outlined machine learning pipeline requirements. It addresses several aspects of machine learning operations, such as model selection, data sources, and action points. However, the mechanisms to define the state of network functions and resources for different network layers and domains are yet to be explored. Therefore, in addition to the above, specific studies are required for cross‐layer and administrative domain knowledge sharing and holistic optimization. Without considering the holistic view, applying machine learning to optimizing individual functions and resources may result in nonoptimal and unexpected behaviors instead of having any positive effect. The need for global and deep holistic learning has recently been identified in the literature; however, no implementation of the above scheme has been studied or evaluated for networks. This article proposes a novel approach, implementing global and deep holistic learning in wireless networks toward fulfilling the requirements of future intelligent networks. It also proposes an objective function‐based feature engineering for wireless networks. Experimental results of the proposed scheme show significant improvements in the network performance and accuracy of the machine learning models. Moreover, applications of transfer learning to base learners for network functions and resources can significantly expedite decision‐making with reduced computational costs.