Fei Yu scite author profile

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training lowbit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7× speed up, compared with the open-source 8-bit high-performance inference framework NCNN [31].

show abstract

POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

et al. 2016

361

240

View full text Add to dashboard Cite

Abstract. Detection and learning based appearance feature play the central role in data association based multiple object tracking (MOT), but most recent MOT works usually ignore them and only focus on the hand-crafted feature and association algorithms. In this paper, we explore the high-performance detection and deep learning based appearance feature, and show that they lead to significantly better MOT results in both online and offline setting. We make our detection and appearance feature publicly available 1 . In the following part, we first summarize the detection and appearance feature, and then introduce our tracker named Person of Interest (POI), which has both online and offline version 2 . DetectionIn data association based MOT, the tracking performance is heavily affected by the detection results. We implement our detector based on Faster R-CNN [14]. In our implementation, the CNN model is fine-tuned from the VGG-16 on ImageNet. In considering the definition of MOTA in MOT16 [12], the sum of false negatives (FN) and false positives (FP) poses a large impact on the value of MOTA. In Table 1, we show that our detection optimization strategies lead to the significant decrease in the sum of FP and FN 3 .1 https://drive.google.com/open?id=0B5ACiy41McAHMjczS2p0dFg3emM 2 We use POI to denote our online tracker and KDNT to denote our offline tracker in submission. 3 We use detection score threshold 0.3 for Faster R-CNN and -1 for DPMv5 , labeling the ID of detection box with incremental integer, and evaluate FP and FN with MOT16 devkit.

show abstract

Forward and Backward Information Retention for Accurate Binary Neural Networks

et al. 2020

View full text Add to dashboard Cite

Model binarization is an effective method of compressing neural networks and accelerating their inference process, which enables state-of-the-art models to run on resource-limited devices. Recently, advanced binarization methods have been greatly improved by minimizing the quantization error directly in the forward process. However, a significant performance gap still exists between the 1-bit model and the 32-bit one. The empirical study shows that binarization causes a great loss of information in the forward and backward propagation which harms the performance of binary neural networks (BNNs), and the limited information representation ability of binarized parameter is one of the bottlenecks of BNN performance. We present a novel Distributionsensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients, which improves BNNs

show abstract

Incorporating Convolution Designs into Visual Transformers

Yuan¹,

et al. 2021

View full text Add to dashboard Cite

Towards Unified INT8 Training for Convolutional Neural Network

Zhu

Gong

et al. 2020

View full text Add to dashboard Cite

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Li¹,

Liang²,

Zhao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) (Radford et al., 2021) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks. However, CLIP is quite data-hungry and requires 400M image-text pairs for pre-training, thereby restricting its adoption. This work proposes a novel training paradigm, Data efficient CLIP (DeCLIP), to alleviate this limitation. We demonstrate that by carefully utilizing the widespread supervision among the image-text pairs, our DeCLIP can learn generic visual features more efficiently. Instead of using the single image-text contrastive supervision, we fully exploit data potential through the use of (1) self-supervision within each modality; (2) multi-view supervision across modalities; (3) nearest-neighbor supervision from other similar pairs. Benefiting from these intrinsic supervision, our DeCLIP-ResNet50 can achieve 60.4% zero-shot top1 accuracy on ImageNet, which is 0.8% above the CLIP-ResNet50 while using 7.1× fewer data. Our DeCLIP-ResNet50 outperforms its counterpart in 8 out of 11 visual datasets when transferred to downstream tasks. Moreover, Scaling up the model and computing also works well in our framework. Our code, dataset and models are released at: https://github.com/Sense-GVT/ * The first three authors contribute equally. The order is determined by dice rolling.

show abstract

Incorporating Convolution Designs into Visual Transformers

Yuan

Guo²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Motivated by the success of Transformers in natural language processing (NLP) tasks, there emerge some attempts (e.g., ViT and DeiT) to apply Transformers to the vision domain. However, pure Transformer architectures often require a large amount of training data or extra supervision to obtain comparable performance with convolutional neural networks (CNNs). To overcome these limitations, we analyze the potential drawbacks when directly borrowing Transformer architectures from NLP. Then we propose a new Convolution-enhanced image Transformer (CeiT) which combines the advantages of CNNs in extracting lowlevel features, strengthening locality, and the advantages of Transformers in establishing long-range dependencies. Three modifications are made to the original Transformer: 1) instead of the straightforward tokenization from raw input images, we design an Image-to-Tokens (I2T) module that extracts patches from generated low-level features; 2) the feed-froward network in each encoder block is replaced with a Locally-enhanced Feed-Forward (LeFF) layer that promotes the correlation among neighboring tokens in the spatial dimension; 3) a Layer-wise Class token Attention (LCA) is attached at the top of the Transformer that utilizes the multi-level representations.Experimental results on ImageNet and seven downstream tasks show the effectiveness and generalization ability of CeiT compared with previous Transformers and stateof-the-art CNNs, without requiring a large amount of training data and extra CNN teachers. Besides, CeiT models also demonstrate better convergence with 3× fewer training iterations, which can reduce the training cost significantly 1 .

show abstract

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Li¹,

Gong²,

Tan³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study the challenging task of neural network quantization without end-toend retraining, called Post-training Quantization (PTQ). PTQ usually requires a small subset of training data but produces less powerful quantized models than Quantization-Aware Training (QAT). In this work, we propose a novel PTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to INT2 for the first time. BRECQ leverages the basic building blocks in neural networks and reconstructs them one-by-one. In a comprehensive theoretical study of the second-order error, we show that BRECQ achieves a good balance between crosslayer dependency and generalization error. To further employ the power of quantization, the mixed precision technique is incorporated in our framework by approximating the inter-layer and intra-layer sensitivity. Extensive experiments on various handcrafted and searched neural architectures are conducted for both image classification and object detection tasks. And for the first time we prove that, without bells and whistles, PTQ can attain 4-bit ResNet and MobileNetV2 comparable with QAT and enjoy 240× faster production of quantized models. Codes are available at https://github.com/yhhhli/BRECQ.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fei Yu

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

POI: Multiple Object Tracking with High Performance Detection and Appearance Feature

Forward and Backward Information Retention for Accurate Binary Neural Networks

Incorporating Convolution Designs into Visual Transformers

Towards Unified INT8 Training for Convolutional Neural Network

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Incorporating Convolution Designs into Visual Transformers

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Contact Info

Product

Resources

About