Abstract:Model quantization is a promising approach to compress deep neural networks and accelerate inference, making it possible to be deployed on mobile and edge devices. To retain the high performance of full-precision models, most existing quantization methods focus on fine-tuning quantized model by assuming training datasets are accessible. However, this assumption sometimes is not satisfied in real situations due to data privacy and security issues, thereby making these quantization methods not applicable. To ach… Show more
“…Note that nwma means n-bit quantization for weights and m-bit quantization for activations. As baselines, we selected ZeroQ [15], ZAQ [14], and GDFQ [13] as the important previous works on generative data-free quantization. In addition, we implemented Mixup [50] and Cutmix [51] on top of GDFQ, which are data augmentation schemes that mix input images.…”
“…where the first term L CE guides the generator to output clearly classifiable samples, and the second term L BN S aligns the batch-normalization statistics of the synthetic samples with those of the batch-normalization layers in the full-precision model. In another previous work ZAQ [14],…”
“…In addition, DSG [28] further suggests relaxing the batch-normalization statistics alignment to generate more diverse samples. ZAQ [14] adopted adversarial training of the generator on the quantization problem and introduced intermediate feature matching between the full-precision and quantized model. However, none of these considered aiming to synthesize boundary supporting samples of the full-precision model.…”
Section: Data-free Compressionmentioning
confidence: 99%
“…Recent generative data-free quantization schemes [13,14] employ a GAN-like generator to create synthetic samples. In the absence of the original training samples, the generator G attempts to generate synthetic samples so that the quantized model Q can mimic the behavior of the full-precision model P .…”
“…Therefore, data-free quantization is a natural direction to achieve a highly accurate quantized model without accessing any training data. Among many excellent prior studies [9,10,11,12], generative methods [13,14,15] have recently been drawing much attention due to their superior performance. Generative methods successfully generate synthetic samples that resemble the distribution of the original dataset and achieve high accuracy using information from the pretrained full-precision network, such as batch-normalization statistics [15,13] or intermediate features [14].…”
Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model quantization usually requires access to the original training data to maintain the accuracy of the full-precision models, which is often infeasible in real-world scenarios for security and privacy issues. A popular approach to perform quantization without access to the original data is to use synthetically generated samples, based on batch-normalization statistics or adversarial learning. However, the drawback of such approaches is that they primarily rely on random noise input to the generator to attain diversity of the synthetic samples. We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries. To this end, we propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples. For the superposed embeddings to better reflect the original distribution, we also propose using an additional disentanglement mapping layer and extracting information from the full-precision model. The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization. Code is available at https://github.com/iamkanghyunchoi/qimera.
“…Note that nwma means n-bit quantization for weights and m-bit quantization for activations. As baselines, we selected ZeroQ [15], ZAQ [14], and GDFQ [13] as the important previous works on generative data-free quantization. In addition, we implemented Mixup [50] and Cutmix [51] on top of GDFQ, which are data augmentation schemes that mix input images.…”
“…where the first term L CE guides the generator to output clearly classifiable samples, and the second term L BN S aligns the batch-normalization statistics of the synthetic samples with those of the batch-normalization layers in the full-precision model. In another previous work ZAQ [14],…”
“…In addition, DSG [28] further suggests relaxing the batch-normalization statistics alignment to generate more diverse samples. ZAQ [14] adopted adversarial training of the generator on the quantization problem and introduced intermediate feature matching between the full-precision and quantized model. However, none of these considered aiming to synthesize boundary supporting samples of the full-precision model.…”
Section: Data-free Compressionmentioning
confidence: 99%
“…Recent generative data-free quantization schemes [13,14] employ a GAN-like generator to create synthetic samples. In the absence of the original training samples, the generator G attempts to generate synthetic samples so that the quantized model Q can mimic the behavior of the full-precision model P .…”
“…Therefore, data-free quantization is a natural direction to achieve a highly accurate quantized model without accessing any training data. Among many excellent prior studies [9,10,11,12], generative methods [13,14,15] have recently been drawing much attention due to their superior performance. Generative methods successfully generate synthetic samples that resemble the distribution of the original dataset and achieve high accuracy using information from the pretrained full-precision network, such as batch-normalization statistics [15,13] or intermediate features [14].…”
Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model quantization usually requires access to the original training data to maintain the accuracy of the full-precision models, which is often infeasible in real-world scenarios for security and privacy issues. A popular approach to perform quantization without access to the original data is to use synthetically generated samples, based on batch-normalization statistics or adversarial learning. However, the drawback of such approaches is that they primarily rely on random noise input to the generator to attain diversity of the synthetic samples. We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries. To this end, we propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples. For the superposed embeddings to better reflect the original distribution, we also propose using an additional disentanglement mapping layer and extracting information from the full-precision model. The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization. Code is available at https://github.com/iamkanghyunchoi/qimera.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.