MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

Lan, Xin; Gu, Xiaojing

doi:10.1007/s10489-021-02687-7

Cited by 30 publications

(9 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e multilanguage machine translation system based on semantic language is composed of two parts, which is a multilanguage unified semantic unit base of high quality, complete, extensible, no discard, no repetition, no false ambiguity, and no normal ambiguity [10].…”

Section: Machine Translation Methods Based On Semantic Unitsmentioning

confidence: 99%

English Characteristic Semantic Block Processing Based on English-Chinese Machine Translation

2022

Advances in Multimedia

View full text Add to dashboard Cite

In order to solve the problem that the current English-Chinese machine translation software cannot understand the characteristics of English sentences repeatedly, a semantic block processing method for English-Chinese machine translation was proposed. In the process of the English sentence comprehension, English semantic block, which played an important role, was analyzed in detail. On the basis of this, the core content and characteristics of English semantic block were discussed. And with the help of the corresponding processing algorithms, taking verbs and business English as the research object, three kinds of semantic models were summarized. A lexical chunk database model based on English characteristic semantic block processing was proposed. The open test results showed that the matching success rate of the model for the semantic pattern database was about 90%.

show abstract

Section: Machine Translation Methods Based On Semantic Unitsmentioning

confidence: 99%

English Characteristic Semantic Block Processing Based on English-Chinese Machine Translation

2022

Advances in Multimedia

View full text Add to dashboard Cite

show abstract

“…1) Comparison on the MFNet Dataset. On the MFNet dataset, we compare our LASNet with 14 state-of-the-art methods, including two RGB semantic segmentation methods (i.e., DANet [12] and HRNet [63]) and their modified RGB-T versions, four RGB-D semantic segmentation methods (i.e., FuseNet [55], D-CNN [59], ACNet [57], and SA-Gate [58]), and eight RGB-T semantic segmentation methods (i.e., MFNet [18], two versions of RTFNet [25], PSTNet [19], MLFNet [26], FuseSeg [28], ABMDRNet [30], MMNet [64], and EGFNet [34]).…”

Section: B Comparison With State-of-the-artsmentioning

confidence: 99%

RGB-T Semantic Segmentation With Location, Activation, and Sharpening

Wang

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Semantic segmentation is important for scene understanding. To address the scenes of adverse illumination conditions of natural images, thermal infrared (TIR) images are introduced. Most existing RGB-T semantic segmentation methods follow three cross-modal fusion paradigms, i.e., encoder fusion, decoder fusion, and feature fusion. Some methods, unfortunately, ignore the properties of RGB and TIR features or the properties of features at different levels. In this paper, we propose a novel feature fusion-based network for RGB-T semantic segmentation, named LASNet, which follows three steps of location, activation, and sharpening. The highlight of LASNet is that we fully consider the characteristics of cross-modal features at different levels, and accordingly propose three specific modules for better segmentation. Concretely, we propose a Collaborative Location Module (CLM) for high-level semantic features, aiming to locate all potential objects. We propose a Complementary Activation Module for middle-level features, aiming to activate exact regions of different objects. We propose an Edge Sharpening Module (ESM) for lowlevel texture features, aiming to sharpen the edges of objects. Furthermore, in the training phase, we attach a location supervision and an edge supervision after CLM and ESM, respectively, and impose two semantic supervisions in the decoder part to facilitate network convergence. Experimental results on two public datasets demonstrate that the superiority of our LASNet over relevant state-of-the-art methods. The code and results of our method are available at https://github.com/MathLee/LASNet.

show abstract

“…Additionally, the means of these networks that are designed for achieving better performance can also be utilized in single-modality tasks. Modality-specific networks that can aggregate complementary information from different modalities are valuable for multi-modal networks ( Zhu et al, 2016 ; Lan et al, 2022 ). MMFNet uses three specific encoders to separately extract modality-specific features from corresponding modality images.…”

Section: Related Studiesmentioning

confidence: 99%

“…Although some of the abovementioned works utilized modality-specific features to extract additional representative features, the discarded low-level multi-modal fusion features are also crucial in aggregating the complementary information of modalities ( Lan et al, 2022 ). To solve this problem, a multi-modal fusion network is deployed in MSMFF encoder (or decoder) blocks to fuse the modality-specific features and multi-modal fusion features of the former layers.…”

Section: Related Studiesmentioning

confidence: 99%

BSMM-Net: Multi-modal neural network based on bilateral symmetry for nasopharyngeal carcinoma segmentation

Zhou

Chen

et al. 2023

Front. Hum. Neurosci.

View full text Add to dashboard Cite

IntroductionAutomatically and accurately delineating the primary nasopharyngeal carcinoma (NPC) tumors in head magnetic resonance imaging (MRI) images is crucial for patient staging and radiotherapy. Inspired by the bilateral symmetry of head and complementary information of different modalities, a multi-modal neural network named BSMM-Net is proposed for NPC segmentation.MethodsFirst, a bilaterally symmetrical patch block (BSP) is used to crop the image and the bilaterally flipped image into patches. BSP can improve the precision of locating NPC lesions and is a simulation of radiologist locating the tumors with the bilateral difference of head in clinical practice. Second, modality-specific and multi-modal fusion features (MSMFFs) are extracted by the proposed MSMFF encoder to fully utilize the complementary information of T1- and T2-weighted MRI. The MSMFFs are then fed into the base decoder to aggregate representative features and precisely delineate the NPC. MSMFF is the output of MSMFF encoder blocks, which consist of six modality-specific networks and one multi-modal fusion network. Except T1 and T2, the other four modalities are generated from T1 and T2 by the BSP and DT modal generate block. Third, the MSMFF decoder with similar structure to the MSMFF encoder is deployed to supervise the encoder during training and assure the validity of the MSMFF from the encoder. Finally, experiments are conducted on the dataset of 7633 samples collected from 745 patients.Results and discussionThe global DICE, precision, recall and IoU of the testing set are 0.82, 0.82, 0.86, and 0.72, respectively. The results show that the proposed model is better than the other state-of-the-art methods for NPC segmentation. In clinical diagnosis, the BSMM-Net can give precise delineation of NPC, which can be used to schedule the radiotherapy.

show abstract

MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

Cited by 30 publications

References 35 publications

English Characteristic Semantic Block Processing Based on English-Chinese Machine Translation

English Characteristic Semantic Block Processing Based on English-Chinese Machine Translation

RGB-T Semantic Segmentation With Location, Activation, and Sharpening

BSMM-Net: Multi-modal neural network based on bilateral symmetry for nasopharyngeal carcinoma segmentation

Contact Info

Product

Resources

About