Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only a single image generally produce unsatisfactory results due to the lack of high level context. In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditioning on the available data. Given a trained generative model, we search for the closest encoding of the corrupted image in the latent image manifold using our context and prior losses. This encoding is then passed through the generative model to infer the missing content. In our method, inference is possible irrespective of how the missing content is structured, while the state-of-the-art learning based method requires specific information about the holes in the training phase. Experiments on three datasets show that our method successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming the state-of-the-art methods.
Stuttering is a developmental speech disorder that occurs in 5% of children with spontaneous remission in approximately 70% of cases. Previous imaging studies in adults with persistent stuttering found left white matter deficiencies and reversed right-left asymmetries compared to fluent controls. We hypothesized that similar differences might be present indicating brain development differences in children at risk of stuttering. Optimized voxel-based morphometry compared gray matter volume (GMV) and diffusion tensor imaging measured fractional anisotropy (FA) in white matter tracts in 3 groups: children with persistent stuttering, children recovered from stuttering, and fluent peers. Both the persistent stuttering and recovered groups had reduced GMV from normal in speech-relevant regions: the left inferior frontal gyrus and bilateral temporal regions. Reduced FA was found in the left white matter tracts underlying the motor regions for face and larynx in the persistent stuttering group. Contrary to previous findings in adults who stutter, no increases were found in the right hemisphere speech regions in stuttering or recovered children and no differences in right-left asymmetries. Instead, a risk for childhood stuttering was associated with deficiencies in left gray matter volume while reduced white matter integrity in the left hemisphere speech system was associated with persistent stuttering. Anatomical increases in right hemisphere structures previously found in adults who stutter may have resulted from a lifetime of stuttering. These findings point to the importance of considering the role of neuroplasticity during development when studying persistent forms of developmental disorders in adults.
Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our approaches using the TIMIT speech corpus for a monaural speech separation task. Our proposed models achieve about 3.8⇠4.9 dB SIR gain compared to NMF models, while maintaining better SDRs and SARs.
The perception of prosodic prominence in spontaneous speech is investigated through an online task of prosody transcription using untrained listeners. Prominence is indexed through a probabilistic prominence score assigned to each word based on the proportion of transcribers who perceived the word as prominent. Correlation and regression analyses between perceived prominence, acoustic measures and measures of a word's information status are conducted to test three hypotheses: (i) prominence perception is signal-driven, influenced by acoustic factors reflecting speakers' productions; (ii) perception is expectation-driven, influenced by the listener's prior experience of word frequency and repetition; (iii) any observed influence of word frequency on perceived prominence is mediated through the acoustic signal. Results show correlates of perceived prominence in acoustic measures, in word log-frequency and in the repetition index of a word, consistent with both signal-driven and expectation-driven hypotheses of prominence perception. But the acoustic correlates of perceived prominence differ somewhat from the correlates of word frequency, suggesting an independent effect of frequency on prominence perception. A speech processing account is offered as a model of signal-driven and expectation-driven effects on prominence perception, where prominence ratings are a function of the ease of lexical processing, as measured through the activation levels of lexical and sub-lexical units.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.