Existing objective evaluation metrics for voice conversion (VC) are not always correlated with human perception. Therefore, training VC models with such criteria may not effectively improve naturalness and similarity of converted speech. In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech. We adopt the convolutional and recurrent neural network models to build a mean opinion score (MOS) predictor, termed as MOSNet. The proposed models are tested on large-scale listening test results of the Voice Conversion Challenge (VCC) 2018. Experimental results show that the predicted scores of the proposed MOSNet are highly correlated with human MOS ratings at the system level while being fairly correlated with human MOS ratings at the utterance level. Meanwhile, we have modified MOSNet to predict the similarity scores, and the preliminary results show that the predicted scores are also fairly correlated with human ratings. These results confirm that the proposed models could be used as a computational evaluator to measure the MOS of VC systems to reduce the need for expensive human rating.
Cracking is a common pavement distress that would cause further severe problems if not repaired timely, which means that it is important to accurately extract the information of pavement cracks through detection and segmentation. Automated pavement crack detection and segmentation using deep learning are more efficient and accurate than conventional methods, which could be further improved. While many existing studies have utilized deep learning in pavement crack segmentation, which segments cracks from non-crack regions, few studies have taken the exact pavement crack detection into account, which identifies cracks in the images from other objects. A two-step pavement crack detection and segmentation method based on convolutional neural network was proposed in this paper. An automated pavement crack detection algorithm was developed using the modified You Only Look Once 3rd version in the first step. The proposed crack segmentation method in the second step was based on the modified U-Net, whose encoder was replaced with a pre-trained ResNet-34 and the upsample part was added with spatial and channel squeeze and excitation (SCSE) modules. Proposed method combines pavement crack detection and segmentation together, so that the detected cracks from the first step are segmented in the second step to improve the accuracy. A dataset of pavement crack images in different circumstances were also built for the study. The F1 score of proposed crack detection and segmentation methods are 90.58% and 95.75%, respectively, which are higher than other state-of-the-art methods. Compared with existing one-step pavement crack detection or segmentation methods, proposed two-step method showed advantages of accuracy.
An ideal network window electrode for photovoltaic applications should provide an optimal surface coverage, a uniform current density into and/or from a substrate, and a minimum of the overall resistance for a given shading ratio. Here we show that metallic networks with quasi-fractal structure provides a near-perfect practical realization of such an ideal electrode. We find that a leaf venation network, which possesses key characteristics of the optimal structure, indeed outperforms other networks. We further show that elements of hierarchal topology, rather than details of the branching geometry, are of primary importance in optimizing the networks, and demonstrate this experimentally on five model artificial hierarchical networks of varied levels of complexity. In addition to these structural effects, networks containing nanowires are shown to acquire transparency exceeding the geometric constraint due to the plasmonic refraction.
Automated pavement crack image segmentation is challenging because of inherent irregular patterns, lighting conditions, and noise in images. Conventional approaches require a substantial amount of feature engineering to differentiate crack regions from non-affected regions. In this paper, we propose a deep learning technique based on a convolutional neural network to perform segmentation tasks on pavement crack images. Our approach requires minimal feature engineering compared to other machine learning techniques. We propose a U-Net-based network architecture in which we replace the encoder with a pretrained ResNet-34 neural network. We use a "one-cycle" training schedule based on cyclical learning rates to speed up the convergence. Our method achieves an F1 score of 96% on the CFD dataset and 73% on the Crack500 dataset, outperforming other algorithms tested on these datasets. We perform ablation studies on various techniques that helped us get marginal performance boosts, i.e., the addition of spatial and channel squeeze and excitation (SCSE) modules, training with gradually increasing image sizes, and training various neural network layers with different learning rates. INDEX TERMS Convolutional neural network, deep learning, fully convolutional network, pavement crack segmentation, U-Net.
Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between low-level features and high-level semantics in remote sensing images, in spite of the improvement of image resolutions. In this paper, we present a new model with an attribute attention mechanism for the description generation of remote sensing images. Therefore, we have explored the impact of the attributes extracted from remote sensing images on the attention mechanism. The results of our experiments demonstrate the validity of our proposed model. The proposed method obtains six higher scores and one slightly lower, compared against several state of the art techniques, on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), and receives all seven higher scores on the UCM Dataset for remote sensing image captioning, indicating that the proposed framework achieves robust performance for semantic description in high-resolution remote sensing images.
In this paper, 2D borophene is synthesized through a liquid‐phase exfoliation. The morphology and structure of as‐prepared borophene are systemically analyzed, and the Z‐scan is used to measure the nonlinear optical properties. It is found that the saturable absorber (SA) properties of borophene make it serve as an excellent broadband optical switch, which is strongly used for mode‐locking in near‐ and mid‐infrared laser systems. Ultrastable pulses with durations as short as 792 and 693 fs are successfully delivered at the central wavelengths of 1063 and 1560 nm, respectively. Furthermore, stable pulses at a wavelength of 1878 nm are demonstrated from a thulium mode‐locked fiber laser based on the same borophene SA. This research reveals a significant potential for borophene used in lasers helping extending the frontiers of photonic technologies.
Deep learning has made major breakthroughs and progress in many fields. This is due to the powerful automatic representation capabilities of deep learning. It has been proved that the design of the network architecture is crucial to the feature representation of data and the final performance. In order to obtain a good feature representation of data, the researchers designed various complex network architectures. However, the design of the network architecture relies heavily on the researchers' prior knowledge and experience. Due to the limitations of human's inherent knowledge, it is difficult for people to jump out of the original thinking paradigm and design an optimal model. Therefore, a natural idea is to reduce human intervention as much as possible and let the algorithm automatically design the architecture of the network. Thus going further to the strong intelligence.In recent years, a large number of related algorithms for Neural Architecture Search (NAS) have emerged. They have made various improvements to the NAS algorithm, and the related research work is complicated and rich. In order to reduce the difficulty for beginners to conduct NAS-related research, a comprehensive and systematic survey on the NAS is essential. Previously related surveys began to classify existing work mainly from the basic components of NAS: search space, search strategy and evaluation strategy. This classification method is more intuitive, but it is difficult for readers to grasp the challenges and the landmark work in the middle. Therefore, in this survey, we provide a new perspective: starting with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then giving solutions for subsequent related research work. In addition, we conducted a detailed and comprehensive analysis, comparison and summary of these works. Finally, we give possible future research directions.CCS Concepts: • Computing methodologies → Machine learning algorithms.
Graphene and its derivatives have drawn interest across many disciplines due to their remarkable properties. We investigated the influence of graphite oxide (GO), aluminum oxide (Al2O3), and cerium oxide (CeO2) nanoparticles at 0.1% and 0.01% dosing concentrations on the combustion characteristics of diesel fuel by using the single droplet combustion experiment. Shortened ignition delay (ID) by up to 46.5%, increased burn-rate constant (up to 29.4%), reduced peak temperature (up to 13.8%), and shortened burnout time (up to 19.2%) are observed when a GO nanoparticle is dosed in diesel fuel. These remarkable features may substantially improve the combustion efficiency and reduce harmful emissions in diesel engine applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.