Focal Modulation Networks

Yang, Jun; Li, Chunyuan; Gao, Jianfeng

doi:10.48550/arxiv.2203.11926

Cited by 10 publications

(11 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the visual backbone, we adopt pretrained Swin-T/L [34] by default. We also use Focal-T [48] in our ablation studies following [60]. For the language backbone, we adopt the pretrained base model in UniCL [49].…”

Section: Methodsmentioning

confidence: 99%

A Simple Framework for Open-Vocabulary Segmentation and Detection

Zhang¹,

Li²,

Zou³

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

A Simple Framework for Open-Vocabulary Segmentation and Detection

Zhang¹,

Li²,

Zou³

et al. 2023

Preprint

View full text Add to dashboard Cite

“…Inspired by the success of vision transformers, researchers have challenged the traditional small kernel design of CNNs [22,52] and suggested the use of large convolution kernels for visual tasks [11,17,18,38,40,46,73]. For example, ConvNeXt [40] suggest directly adopting a 7×7 depth-wise convolution, while the Visual Attention Network (VAN) [18] uses a kernel size of 21 × 21 and introduces an attention mechanism.…”

Section: Large Kernel Design In Cnnsmentioning

confidence: 99%

Long Range Pooling for 3D Large-Scale Scene Understanding

Li¹,

Guo²,

Mu³

et al. 2023

Preprint

View full text Add to dashboard Cite

Inspired by the success of recent vision transformers and large kernel design in convolutional neural networks (CNNs), in this paper, we analyze and explore essential reasons for their success. We claim two factors that are critical for 3D large-scale scene understanding: a larger receptive field and operations with greater non-linearity. The former is responsible for providing long range contexts and the latter can enhance the capacity of the network. To achieve the above properties, we propose a simple yet effective long range pooling (LRP) module using dilation max pooling, which provides a network with a large adaptive receptive field. LRP has few parameters, and can be readily added to current CNNs. Also, based on LRP, we present an entire network architecture, LRPNet, for 3D understanding. Ablation studies are presented to support our claims, and show that the LRP module achieves better results than large kernel convolution yet with reduced computation, due to its nonlinearity. We also demonstrate the superiority of LRPNet on various benchmarks: LRPNet performs the best on Scan-Net and surpasses other CNN-based methods on S3DIS and Matterport3D. Code will be made publicly available.

show abstract

“…The inference pathway used to segment a new case follows the same feed forward path shown by the black arrows plus additional re-locating and segmentation steps downstream of the blue arrows. The proposed model is built via a cascade network, which is composed of three subnetworks, that is, a focal modulation, 21 a hierarchical block 22 and a topological 23 fully convolutional network (FCN). We name the proposed cascade network as a topological modulated network.…”

Section: Overviewmentioning

confidence: 99%

Automatic segmentation of neurovascular bundle on mri using deep learning based topological modulated network

et al. 2023

View full text Add to dashboard Cite

PurposeRadiation damage on neurovascular bundles (NVBs) may be the cause of sexual dysfunction after radiotherapy for prostate cancer. However, it is challenging to delineate NVBs as organ‐at‐risks from planning CTs during radiotherapy. Recently, the integration of MR into radiotherapy made NVBs contour delineating possible. In this study, we aim to develop an MRI‐based deep learning method for automatic NVB segmentation.MethodsThe proposed method, named topological modulated network, consists of three subnetworks, that is, a focal modulation, a hierarchical block and a topological fully convolutional network (FCN). The focal modulation is used to derive the location and bounds of left and right NVBs’, namely the candidate volume‐of‐interests (VOIs). The hierarchical block aims to highlight the NVB boundaries information on derived feature map. The topological FCN then segments the NVBs inside the VOIs by considering the topological consistency nature of the vascular delineating. Based on the location information of candidate VOIs, the segmentations of NVBs can then be brought back to the input MRI's coordinate system.ResultsA five‐fold cross‐validation study was performed on 60 patient cases to evaluate the performance of the proposed method. The segmented results were compared with manual contours. The Dice similarity coefficient (DSC) and 95th percentile Hausdorff distance (HD95) are (left NVB) 0.81 ± 0.10, 1.49 ± 0.88 mm, and (right NVB) 0.80 ± 0.15, 1.54 ± 1.22 mm, respectively.ConclusionWe proposed a novel deep learning‐based segmentation method for NVBs on pelvic MR images. The good segmentation agreement of our method with the manually drawn ground truth contours supports the feasibility of the proposed method, which can be potentially used to spare NVBs during proton and photon radiotherapy and thereby improve the quality of life for prostate cancer patients.

show abstract

Focal Modulation Networks

Cited by 10 publications

References 77 publications

A Simple Framework for Open-Vocabulary Segmentation and Detection

A Simple Framework for Open-Vocabulary Segmentation and Detection

Long Range Pooling for 3D Large-Scale Scene Understanding

Automatic segmentation of neurovascular bundle on mri using deep learning based topological modulated network

Contact Info

Product

Resources

About