Recent advances in Convolutional Neural Networks (CNNs) have obtained promising results in difficult deep learning tasks. However, the success of a CNN depends on finding an architecture to fit a given problem. A hand-crafted architecture is a challenging, time-consuming process that requires expert knowledge and effort, due to a large number of architectural design choices. In this article, we present an efficient framework that automatically designs a high-performing CNN architecture for a given problem. In this framework, we introduce a new optimization objective function that combines the error rate and the information learnt by a set of feature maps using deconvolutional networks (deconvnet). The new objective function allows the hyperparameters of the CNN architecture to be optimized in a way that enhances the performance by guiding the CNN through better visualization of learnt features via deconvnet. The actual optimization of the objective function is carried out via the Nelder-Mead Method (NMM). Further, our new objective function results in much faster convergence towards a better architecture. The proposed framework has the ability to explore a CNN architecture's numerous design choices in an efficient way and also allows effective, distributed execution and synchronization via web services. Empirically, we demonstrate that the CNN architecture designed with our approach outperforms several existing approaches in terms of its error rate. Our results are also competitive with state-of-the-art results on the MNIST dataset and perform reasonably against the state-of-the-art results on CIFAR-10 and CIFAR-100 datasets. Our approach has a significant role in increasing the depth, reducing the size of strides, and constraining some convolutional layers not followed by pooling layers in order to find a CNN architecture that produces a high recognition performance.
Although deep learning algorithms have achieved significant progress in a variety of domains, they require costly annotations on huge datasets. Self-supervised learning (SSL) using unlabeled data has emerged as an alternative, as it eliminates manual annotation. To do this, SSL constructs feature representations using pretext tasks that operate without manual annotation, which allows models trained in these tasks to extract useful latent representations that later improve downstream tasks such as object classification and detection. The early methods of SSL are based on auxiliary pretext tasks as a way to learn representations using pseudo-labels, or labels that were created automatically based on the dataset’s attributes. Furthermore, contrastive learning has also performed well in learning representations via SSL. To succeed, it pushes positive samples closer together, and negative ones further apart, in the latent space. This paper provides a comprehensive literature review of the top-performing SSL methods using auxiliary pretext and contrastive learning techniques. It details the motivation for this research, a general pipeline of SSL, the terminologies of the field, and provides an examination of pretext tasks and self-supervised methods. It also examines how self-supervised methods compare to supervised ones, and then discusses both further considerations and ongoing challenges faced by SSL.
Weather conditions have a significant effect on humans' daily lives and production, ranging from clothing choices to travel, outdoor sports, and solar energy systems. Recent advances in computer vision based on deep learning methods have shown notable progress in both scene awareness and image processing problems. These results have highlighted network depth as a critical factor, as deeper networks achieve better outcomes. This paper proposes a deep learning model based on DenseNet-121 to effectively recognize weather conditions from images. DenseNet performs significantly better than previous models; it also uses less processing power and memory to further increase its efficiency. Since this field currently lacks adequate labeled images for training in weather image recognition, transfer learning and data augmentation techniques were applied. Using the ImageNet dataset, these techniques fine-tuned pre-trained models to speed up training and achieve better end results. Because DenseNet-121 requires sufficient data and is architecturally complex, the expansion of data via geometric augmentation-such as rotation, translation, flipping, and scaling-was critical in decreasing overfitting and increasing the effectiveness of fine-tuning. These experiments were conducted on the RFS dataset, and the results demonstrate both the efficiency and advantages of the proposed method, which achieved an accuracy rate of 95.9%.
Currently, treating sign language issues and producing high quality solutions has attracted researchers and practitioners’ attention due to the considerable prevalence of hearing disabilities around the world. The literature shows that Arabic Sign Language (ArSL) is one of the most popular sign languages due to its rate of use. ArSL is categorized into two groups: The first group is ArSL, where words are represented by signs, i.e., pictures. The second group is ArSl alphabetic (ArSLA), where each Arabic letter is represented by a sign. This paper introduces a real time ArSLA recognition model using deep learning architecture. As a methodology, the proceeding steps were followed. First, a trusted scientific ArSLA dataset was located. Second, the best deep learning architectures were chosen by investigating related works. Third, an experiment was conducted to test the previously selected deep learning architectures. Fourth, the deep learning architecture was selected based on extracted results. Finally, a real time recognition system was developed. The results of the experiment show that the AlexNet architecture is the best due to its high accuracy rate. The model was developed based on AlexNet architecture and successfully tested at real time with a 94.81% accuracy rate.
Robot navigation in indoor environments has become an essential task for several applications, including situations in which a mobile robot needs to travel independently to a certain location safely and using the shortest path possible. However, indoor robot navigation faces challenges, such as obstacles and a dynamic environment. This paper addresses the problem of social robot navigation in dynamic indoor environments, through developing an efficient SLAM-based localization and navigation system for service robots using the Pepper robot platform. In addition, this paper discusses the issue of developing this system in a way that allows the robot to navigate freely in complex indoor environments and efficiently interact with humans. The developed Pepper-based navigation system has been validated using the Robot Operating System (ROS), an efficient robot platform architecture, in two different indoor environments. The obtained results show an efficient navigation system with an average localization error of 0.51 m and a user acceptability level of 86.1%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.