The quality of text-to-speech (TTS) voices built from noisy speech is compromised. Enhancing the speech data before training has been shown to improve quality but voices built with clean speech are still preferred. In this paper we investigate two different approaches for speech enhancement to train TTS systems. In both approaches we train a recursive neural network (RNN) to map acoustic features extracted from noisy speech to features describing clean speech. The enhanced data is then used to train the TTS acoustic model. In one approach we use the features conventionally employed to train TTS acoustic models, i.e Mel cepstral (MCEP) coefficients, aperiodicity values and fundamental frequency (F0). In the other approach, following conventional speech enhancement methods, we train an RNN using only the MCEP coefficients extracted from the magnitude spectrum. The enhanced MCEP features and the phase extracted from noisy speech are combined to reconstruct the waveform which is then used to extract acoustic features to train the TTS system. We show that the second approach results in larger MCEP distortion but smaller F0 errors. Subjective evaluation shows that synthetic voices trained with data enhanced with this method were rated higher and with similar to scores to voices trained with clean speech.
Deep learning has made substantial breakthroughs in many fields due to its powerful automatic representation capabilities. It has been proven that neural architecture design is crucial to the feature representation of data and the final performance. However, the design of the neural architecture heavily relies on the researchers’ prior knowledge and experience. And due to the limitations of humans’ inherent knowledge, it is difficult for people to jump out of their original thinking paradigm and design an optimal model. Therefore, an intuitive idea would be to reduce human intervention as much as possible and let the algorithm automatically design the neural architecture.
Neural Architecture Search
(
NAS
) is just such a revolutionary algorithm, and the related research work is complicated and rich. Therefore, a comprehensive and systematic survey on the NAS is essential. Previously related surveys have begun to classify existing work mainly based on the key components of NAS: search space, search strategy, and evaluation strategy. While this classification method is more intuitive, it is difficult for readers to grasp the challenges and the landmark work involved. Therefore, in this survey, we provide a new perspective: beginning with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then providing solutions for subsequent related research work. In addition, we conduct a detailed and comprehensive analysis, comparison, and summary of these works. Finally, we provide some possible future research directions.
The development of multifunctional and efficient electromagnetic wave absorbing materials is a challenging research hotspot. Here, the magnetized Ni flower/MXene hybrids are successfully assembled on the surface of melamine foam (MF) through electrostatic self-assembly and dip-coating adsorption process, realizing the integration of microwave absorption, infrared stealth, and flame retardant. Remarkably, the Ni/MXene-MF achieves a minimum reflection loss (RLmin) of − 62.7 dB with a corresponding effective absorption bandwidth (EAB) of 6.24 GHz at 2 mm and an EAB of 6.88 GHz at 1.8 mm. Strong electromagnetic wave absorption is attributed to the three-dimensional magnetic/conductive networks, which provided excellent impedance matching, dielectric loss, magnetic loss, interface polarization, and multiple attenuations. In addition, the Ni/MXene-MF endows low density, excellent heat insulation, infrared stealth, and flame-retardant functions. This work provided a new development strategy for the design of multifunctional and efficient electromagnetic wave absorbing materials.
Due to the structural problem, the traditional neural network models are prone to problems such as gradient explosion and over-fitting, while the deep GRU neural network model has low update efficiency and poor information processing capability among multiple hidden layers. Based on this, this paper proposes an optimized gated recurrent unit(OGRU) neural network.The OGRU neural network model proposed in this paper improves information processing capability and learning efficiency by optimizing the unit structure and learning mechanism of GRU, and avoids the update gate being interfered by the current forgetting information. The experiment uses Tensorflow framework to establish prediction models for LSTM neural network, GRU neural network and OGRU neural network respectively, and compare the prediction accuracy. The results show that the OGRU model has the highest learning efficiency and better prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.