There is an increased exploration of the potential of wireless communication networks in the automation of daily human tasks via the Internet of Things. Such implementations are only possible with the proper design of networks. Path loss prediction is a key factor in the design of networks with parameters such as cell radius, antenna heights, and the number of cell sites that can be set. As path loss is affected by the environment, satellite images of network locations are used in developing path loss prediction models such that environmental effects are captured. We developed a path loss model based on the Extreme Gradient Boosting (XGBoost) algorithm, whose inputs are numeric (non-image) features that influence path loss and features extracted from images composed of four tiled satellite images of points along the transmitter to receiver path. The model can predict path loss for multiple frequencies, antenna heights, and environments such that it can be incorporated into Radio Planning Tools. Various feature extraction methods that included CNN and hand-crafted and their combinations were applied to the images in order to determine the best input features, which, when combined with non-image features, will result in the best XGBoost model. Although hand-crafted features have the advantage of not requiring a large volume of data as no training is involved in them, they failed in this application as their use led to a reduction in accuracy. However, the best model was obtained when image features extracted using CNN and GLCM were combined with the non-image features, resulting in an RMSE improvement of 9.4272% against a model with non-image features only without satellite images. The XGBoost model performed better than Random Forest (RF), Extreme Learning Trees (ET), Gradient Boosting, and K Nearest Neighbor (KNN) based on the combination of CNN, GLCM, and non-image features. Further analysis using the Shapley Additive Explanations (SHAP) revealed that features extracted from the satellite images using CNN had the highest contribution toward the XGBoost model’s output. The variation in values of features with output path loss values was presented using SHAP summary plots. Interactions were also observed between some features based on their dependence plots from the computed SHAP values. This information, when further explored, could serve as the basis for the development of an explainable/glass box path loss model.
This work aims at developing a generalized and optimized path loss model that considers rural, suburban, urban, and urban high rise environments over different frequencies, for use in the Heterogenous Ultra Dense Networks in 5G. Five different machine learning algorithms were tested on four combined datasets, with a sum of 12369 samples in which their hyper-parameters were automatically optimized using Bayesian optimization, HyperBand and Asynchronous Successive Halving (ASHA). For the Bayesian optimization, three surrogate models (the Gaussian Process, Tree Structured Parzen Estimator and Random Forest) were considered. To the best of our knowledge, few works have been found on automatic hyper-parameter optimization for path loss prediction and none of the works used the aforementioned optimization techniques. Differentiation among the various environments was achieved by the assignment of the clutter height values based on International Telecommunication Union Recommendation (ITU-R) P.452-16. We also included the elevation of the transmitting antenna position as a feature so as to capture its effect on path loss. The best machine learning model observed is K Nearest Neighbor (KNN), achieving mean Coefficient of Determination (R2), average Mean Absolute Error (MAE) and mean Root Mean Squared Error (RMSE) values of 0.7713, 4.8860dB, and 6.8944dB, respectively, obtained from 100 different samplings of train set and test set. Results show that machine learning can also be used to develop path loss models that are valid for a certain range of distances, frequencies, antenna heights, and environment types. HyperBand produced hyper-parameter configurations with the highest accuracy in most of the algorithms.
Wireless network parameters such as transmitting power, antenna height, and cell radius are determined based on predicted path loss. The prediction is carried out using empirical or deterministic models. Deterministic models provide accurate predictions but are slow due to their computational complexity, and they require detailed environmental descriptions. While empirical models are less accurate, Machine Learning (ML) models provide fast predictions with accuracies comparable to that of deterministic models. Most Empirical models are versatile as they are valid for various values of frequencies, antenna heights, and sometimes environments, whereas most ML models are not. Therefore, developing a versatile ML model that will surpass empirical model accuracy entails collecting data from various scenarios with different environments and network parameters and using the data to develop the model. Combining datasets of different sizes could lead to lopsidedness in accuracy such that the model accuracy for a particular scenario is low due to data imbalance. This is because model accuracy varies at certain regions of the dataset and such variations are more intense when the dataset is generated from a fusion of datasets of different sizes. A Dynamic Regressor/Ensemble selection technique is proposed to address this problem. In the proposed method, a regressor/ensemble is selected to predict a sample point based on the sample’s proximity to a cluster assigned to the regressor/ensemble. K Means Clustering was used to form the clusters and the regressors considered are K Nearest Neighbor (KNN), Extreme Learning Trees (ET), Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGBoost). The ensembles are any combinations of two, three or four of the regressors. The sample points belonging to each cluster were selected from a validation set based on the regressor that made prediction with lowest absolute error per individual sample point. Implementation of the proposed technique resulted in accuracy improvements in a scenario described by a few sample points in the training data. Improvements in accuracy were also observed on datasets in other works compared to the accuracy reported in the works. The study also shows that using features extracted from satellite images to describe the environment was more appropriate than using a categorical clutter height value.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.