Cloud and fog computing along with network function virtualization technology have significantly shifted the development of network architectures. They yield in reduced capital and operating expenditures, while achieving network flexibility and scalability to accommodate the massive growth in data traffic volumes from user terminals requesting various services and applications. Now cloud solutions here offer abundant computing and storage resources, at the detriment of high end-to-end delays, hence limiting quality of service for delay-sensitive applications. Meanwhile, fog solutions offer reduced delays, at the detriment of limited resources. Existing efforts focus on merging the two solutions and propose multi-tier hybrid fog-cloud architectures to leverage their both saliencies. However, these approaches can be inefficient when the applications are delay-sensitive and require high resources. Hence this work proposes a novel standalone heterogeneous fog architecture that is composed of high-capacity and low-capacity fog nodes, both located at the terminals proximity. Thereby, realizing a substrate network that offers reduced delays and high resources, without the need to relay to the cloud nodes. Moreover, the work here leverages and deploys a deep learning network to propose a service function chain provisioning scheme implemented on this architecture. The scheme predicts the popular network functions, and maps them on the high-capacity nodes, whereas it predicts the unpopular network functions and maps them on the low-capacity nodes. The goal is to predict the next incoming function and prefetch it on the node. Hence, when a future request demands the same function, it can be cached directly from the node, at reduced resources consumption, processing times, cost, and energy consumption. Also, this yields in higher number of satisfied requests and increased capacity. The deep learning network yields reduced loss model and high success rates.