We survey various knowledge distillation (KD) strategies for simple classification tasks and implement a set of techniques that claim state-of-the-art accuracy. Our experiments using standardized model architectures, fixed compute budgets, and consistent training schedules indicate that many of these distillation results are hard to reproduce. This is especially apparent with methods using some form of feature distillation. Further examination reveals a lack of generalizability where these techniques may only succeed for specific architectures and training settings. We observe that appropriately tuned classical distillation in combination with a data augmentation training scheme gives an orthogonal improvement over other techniques. We validate this approach and open-source our code 1 .
Recent advances in network function virtualization have prompted the research community to consider data-centerscale deployments. However, existing tools, such as E2 and SOL, limit VNF chain allocation to rack-scale and provide limited support for management of allocated chains. We define a narrow API to let data center tenants and operators allocate and manage arbitrary VNF chain topologies, and we introduce NetPack, a new stochastic placement algorithm, to implement this API at data-center-scale. We prototyped the resulting system, dubbed Daisy, using the Sonata platform. In data-center-scale simulations on realistic scenarios and topologies that are orders of magnitude larger than prior work, we achieve in all cases an allocation density within 96% of a recently introduced, theoretically complete, constraintsolver-based placement engine, while being 82× faster on average. In detailed emulation with real packet traces, we find that Daisy performs each of our six API calls with at most one second of throughput drop. CCS CONCEPTS • Networks → Middle boxes / network appliances; Cloud computing; In-network processing; Network management;
Deep-learning (DL) compilers such as TVM and TensorRT are increasingly used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can produce optimized models whose semantics differ from the original models, and produce incorrect results impacting the correctness of down stream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach uses (i) light-weight operator specifications to generate diverse yet valid DNN models allowing us to exercise a large part of the compiler's transformation logic; (ii) a gradient-based search process for finding model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) differential testing to identify bugs. We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM, TensorRT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by project maintainers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.