<span>Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text. ASR systems often produce poor accuracy for the noisy speech signal. Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy. The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters. The speech features of these copies differ slightly in order to extract different texts from them when processed by the ASR system. Thus, the best among these texts can be elected as final ASR output. The ensemble technique was compared with three related current noise reduction techniques in terms of CER and WER. The test results were encouraging and showed a relatively decreased by 16.61% and 11.54% on CER and WER compared with the best current technique. ASR field will benefit from the contribution of this research to increase the recognition accuracy of a human speech in the presence of background noise.</span>
Abstract-Different technologies are employed to detect and count people in various situations but crowd counting system based on computer vision is one of the best choices due to a number of advantages. These include accuracy, flexibility, cost and acquiring people distribution information. Crowd counting system based on computer vision can use closed circuit television cameras (CCTV) that have already become ubiquitous and their uses are increasing. This paper aims to develop crowd counting system that can be incorporated with existing CCTV cameras. In this paper, the extracted low-level features in a frame-to-frame analysis are processed using regression technique to estimate the number of people. Two complex scenes and environments are used to evaluate the performances of the proposed system. The results have shown that the proposed system can achieve good performance in terms of the mean absolute error (MAE) and mean squared error (MSE).
The Google N-gram dataset contains millions of books in 22 different categories. Therefore, it is used widely as a language resource in natural language processing. The Arabic language has several resources, but most of them are small in size. As with any research, if training resources are large, then the validity of any research is higher. To build a new language resource manually by humans take years to accomplish, but using automatic methods can be done in a few days. Google N-gram dataset does not support the Arabic language yet. This will cause a challenge for researchers who use this dataset for extracting Arabic text. In this paper, an extraction method that can produce a lexicon and N-gram corpus from the English Google N-gram dataset is presented. The steps to build these recourses and to overcome Google N-gram dataset challenges are explained. The experiment results show the success of the proposed method in the automatic construction of the resources in a short time. This method will be useful to researchers in several research fields.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.