Mutual Information Regularized Feature-level Frankenstein for Discriminative Recognition

Liu, Xiaofeng; Yang, Chao; You, Jane; Kuo, C.-C. Jay; Vijayakumar, B.

doi:10.1109/tpami.2021.3077397

Cited by 21 publications

(9 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our classifier C and feature extractor f play an asymmetrical adversarial game to encourage that f eliminates the modality information. 27 Rather than maximizing the cross-entropy loss, f minimizes the KL-divergence of its softmax prediction and a uniform distribution. Specifically, we minimize the following loss: We note that the modality and position label of the sampled two patches are known, which can be used for supervised training.…”

Section: Our Proposed Networkmentioning

confidence: 99%

Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Liu¹,

Xing²,

Prince³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Cycle reconstruction regularized adversarial training-e.g., CycleGAN, DiscoGAN, and DualGAN-has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature extraction steps that are task-specific. To counter this, this work aims to learn a general add-on structural feature extractor, by explicitly enforcing the structural alignment between an input and its synthesized image. Specifically, we propose a novel input-output image patches self-training scheme to achieve a disentanglement of underlying anatomical structures and imaging modalities. The translator and structure encoder are updated, following an alternating training protocol. In addition, the information w.r.t. imaging modality can be eliminated with an asymmetric adversarial game. We train, validate, and test our network on 1,768, 416, and 1,560 unpaired subject-independent slices of tagged and cine magnetic resonance imaging from a total of twenty healthy subjects, respectively, demonstrating superior performance over competing methods.

show abstract

Section: Our Proposed Networkmentioning

confidence: 99%

Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Liu¹,

Xing²,

Prince³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…A possible solution to avoid the largen scale labeling of the mobile captured video is using the unsupervised domain adaptation to transfer the knowledge from our dataset to the unlabeled mobile dataset [67], [68], [69]. In addition, it is promising to apply a face pose invariant or robust feature extractor as [35], [70], [71], [72].…”

Section: A Clinical Prospectsmentioning

confidence: 99%

Interpreting Depression From Question-Wise Long-Term Video Recording of SDS Evaluation

Xie

Liang

et al. 2022

IEEE J. Biomed. Health Inform.

Self Cite

View full text Add to dashboard Cite

Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinicianadministered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variablelength video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.

show abstract

“…L KL is only applied to the same utterance pairs. In parallel, s i is encouraged to inherit the subject-specific factors with an implicit complementary constraint [15,11]. By enforcing the information bottleneck, i.e., compact or low-dimensional latent feature [11], s i has to incorporate all the necessary complementary content (e.g., subject-specific style of the articulation) other than u i to achieve accurate reconstruction.…”

Section: Pair-wise Disentanglement Trainingmentioning

confidence: 99%

Tagged-MRI2Audio with Attention Guided Heterogeneous Translator

Liu¹,

Xing²,

Prince³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities-i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform-is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size. Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech. In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy. Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms. Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.

show abstract

Mutual Information Regularized Feature-level Frankenstein for Discriminative Recognition

Cited by 21 publications

References 64 publications

Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Interpreting Depression From Question-Wise Long-Term Video Recording of SDS Evaluation

Tagged-MRI2Audio with Attention Guided Heterogeneous Translator

Contact Info

Product

Resources

About