It has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory-augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.
Neural machine translation (NMT) has achieved notable success in recent times, however it is also widely recognized that this approach has limitations with handling infrequent words and word pairs. This paper presents a novel memoryaugmented NMT (M-NMT) architecture, which stores knowledge about how words (usually infrequently encountered ones) should be translated in a memory and then utilizes them to assist the neural model. We use this memory mechanism to combine the knowledge learned from a conventional statistical machine translation system and the rules learned by an NMT system, and also propose a solution for out-of-vocabulary (OOV) words based on this framework. Our experiments on two Chinese-English translation tasks demonstrated that the M-NMT architecture outperformed the NMT baseline by 9.0 and 2.7 BLEU points on the two tasks, respectively. Additionally, we found this architecture resulted in a much more effective OOV treatment compared to competitive methods.
Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.
The use of biometrics has been successfully applied to security applications for some time. However, the extension of other potential applications with the use of biometric information is a very recent development. This paper summarizes the field of biometrics and investigates the potential of utilizing biometrics beyond the presently limited field of security applications. There are some synergies that can be established within security-related applications. These can also be relevant in other fields such as health and ambient intelligence. This paper describes these synergies. Overall, this paper highlights some interesting and exciting research areas as well as possible synergies between different applications using biometric information
This paper proposes a solution for signal read-out in the MEMS cochlea sensors that have very small sensing capacitance and do not have differential sensing structures. The key challenge in such sensors is the significant signal degradation caused by the parasitic capacitance at the MEMS-CMOS interface. Therefore, a novel capacitive read-out circuit with parasitic-cancellation mechanism is developed; the equivalent input capacitance of the circuit is negative and can be adjusted to cancel the parasitic capacitance. Chip results prove that the use of parasitic-cancellation is able to increase the sensor sensitivity by 35 dB without consuming any extra power. In general, the circuit follows a low-degradation low-amplification approach which is more power-efficient than the traditional high-degradation high-amplification approach; it employs parasitic-cancellation to reduce the signal degradation and therefore a lower gain is required in the amplification stage. Besides, the chopper-stabilization technique is employed to effectively reduce the low-frequency circuit noise and DC offsets. As a result of these design considerations, the prototype chip demonstrates the capability of converting a 7.5 fF capacitance change of a 1-Volt-biased 0.5 pF capacitive sensor pair into a 0.745 V signal-conditioned output at the cost of only 165.2 μW power consumption.
Abstract. The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.