Infants typically cry to get their parents' attention. Through their cries, infants express their basic needs like hunger, tiredness, pain, and discomfort. Unfortunately, it is difficult to interpret cries to comprehend the demands of an infant. The only way to solve this problem is to analyze the infant's acoustic speech pattern and determine the cause of the crying. In this study, the cry signal is converted to a spectrogram image to take advantage of the wide spectral range of image-based features. Before generating the represented features, the watershed segmentation algorithm is used to remove distracting areas of the image. Then, histogram of gradients (HoG) features are generated. Because the feature vector has high dimensionality, two stages of dimensionality reduction are presented. First, the feature pool is decreased using the fisher score feature selection approach. The ideal feature set is then chosen using a combination of transfer learning, genetic algorithm (GA), and neural networks. To motivate GA to pick characteristics that will operate successfully with the neural network, a ranked aware mutation operator is suggested. As system evaluation material, the donateacry-corpus public dataset is employed. Experiments reveal that when 80 HoG features are generated and the best 37 Fisher scores are chosen, the model has the best accuracy of 92% when applying transfer learning to 11 hidden layers of the neural network. The study's findings support the use of image-based features to identify the cause of a baby's crying.