We present first empirical results from our ongoing investigation of distribution shifts in image data used for various computer vision tasks. Instead of analyzing the original training and test data, we propose to study shifts in the learned weights of trained models. In this work, we focus on the properties of the distributions of dominantly used 3 × 3 convolution filter kernels. We collected and publicly provide a data set with over half a billion filters from hundreds of trained CNNs, using a wide range of data sets, architectures, and vision tasks. Our analysis shows interesting distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like data type, task, architecture, or layer depth. We argue, that the observed properties are a valuable source for further investigation into a better understanding of the impact of shifts in the input data to the generalization abilities of CNN models and novel methods for more robust transfer-learning in this domain. Data available at: https://github.com/paulgavrikov/CNN-Filter-DB/.
IntroductionDespite their overwhelming success in the application to various vision tasks, the practical deployment of convolutional neural networks (CNNs) is still suffering from several inherent drawbacks. Two prominent examples are I) the dependence on very large amounts of annotated training data [1], which is not available for all target domains and is expensive to generate; and II) still widely unsolved problems with the robustness and generalization abilities of CNNs [2] towards shifts of the input data distributions. One can argue that both problems are strongly related, since a common practical solution to I) is the fine-tuning [3] of pre-trained models by small data sets from the actual target domain. This results in the challenge to find suitable pre-trained models based on data distributions that are "as close as possible" to the target distributions. Hence, both cases (I+II) imply the need to model and observe distribution shifts in the contexts of CNNs. In this paper, we propose not to investigate these shifts in the input (image) domain, but rather in the weight distributions of the CNNs themselves. We argue that e.g. the distributions of trained convolutional filters in a CNN, which implicitly reflect the sub-distributions of the input image data which are actually utilized by a specific model, are more suitable and easier accessible representations for this task.
MethodsData. We collected a total of 391 publicly available CNN models pre-trained for various visual tasks, recorded meta-data for each model, and manually categorized the training data into visually distinctive groups (data type) like natural scenes, medical ct, seismic, or astronomy for example. All models were trained with full 32-bit precision but may have been trained with variously scaled inputs. The dominant subset is formed by image classification models trained on ImageNet1k [4] (264 models). We extracted all trained convolution filters to get a heterogeneous a...