For medical imaging tasks, it is a prevalent practice to have a multi-modality image dataset, as experts prefer using multiple medical devices to diagnose a disease. Each device can show different aspects of segmentation, which in our case, is magnetic resonance imaging (MRI) brain tumor segmentation. For such medical imaging tasks, researchers tend to combine all modalities as an input into the network for feature extraction, and neglect the complexity between different modalities. It is no longer novel to use an encoder-decoder-based model and residual connections to transfer information from high-resolution maps to lower-resolution maps in medical segmentation tasks. In this work, we propose a multimodal fusion network with bi-directional feature pyramid network (MM-BiFPN) using an individual encoder to extract the features of each of the four modalities (FLAIR, T1-weighted, T1-c, and T2-weighted) to focus on the exploitation of the complex relationships among the modalities. In addition, by using the bidirectional feature pyramid network (Bi-FPN) layer, we focus on the aggregation of multiple modalities to study the cross-modality relationship and multi-scale features. Our experiment was conducted on the brain segmentation challenge datasets, the MICCAI BraTS2018 and MICCAI BraTS2020 datasets. We also implemented two ablation studies on our model with different cross-scale modalities fusion networks, as well as a study on different modality settings to see the effect each modality brings in detecting tumor content. With missing modalities, our method achieves a comparable result, demonstrating that our method is robust for brain tumor segmentation.
Deep learning has been widely adopted for end-to-end time-series classification (TSC). However, the effectiveness of deep learning heavily relies on large-scale data. Thus, deep learning is prone to overfit when only few labeled samples are available. Few-shot learning (FSL) aims to address this issue by learning to generalize to new tasks with few training samples (e.g., one or five samples per class). FSL considers learning good representations crucial to classify accurately using discriminative features. In this study, we propose a framework for few-shot TSC that encodes a time series as different types of images (i.e., Recurrence plot, Markov transition field, and Gramian angular summation/difference field) and train these images to the model using the FSL procedure. Different features of each image enable the model to learn rich information. In addition, we propose temporal-context attention (TCA) and meta-feature fusion (MFF) to maximize the representation ability of these images. TCA incorporates global context of the feature map and highlights pixels having informative relevance with other pixels. After extracting features, MFF refines each feature using different kernels generated based on cross-modality features and fuses the refined features. Finally, the test samples are classified to the nearest class prototype in the embedding space. All experiments are conducted on various N-way K-shot problems. Our framework outperforms state-of-the-art models on 28 standard datasets in the UCR (University of California, Riverside) archive, which is a widely used benchmark dataset in time series classification, from 0.34% up to 29.4%.
Fine-Grained Visual Classification (FGVC) has consistently been challenging in various domains, such as aviation and animal breeds. It is mainly due to the FGVC's criteria that differ with a considerably small range or subtle pattern differences. In the deep convolutional neural network, the covariance between feature maps positively affects the selection of features to learn discriminative regions automatically. In this study, we propose a method for a finegrained classification model by inserting an attention module that uses covariance characteristics. Specifically, we introduce a feature map attention module (FCA) to extract the feature map between convolution blocks, constituting the existing classification model. The FCA module then applies the corresponding value of the covariance matrix to the channel to focus on the salient area. We demonstrate the need for fine-grained classification in a hierarchical manner by focusing on the diverse scale representation. Additionally, we implemented two ablation studies to show how each suggested strategy affects classification performance. Our experiments are conducted on three datasets, CUB-200-2011, Stanford Cars, and FGVC-Aircraft, primarily used for fine-grained classification tasks. Our method outperforms the state-of-the-art models by a margin of 0.4%, 1.1%, and 1.4%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.