Dynamic Architecture and Frequency Scaling (DAFS) is shown to realize superlinear power scaling in high-speed analog-to-digital converters (ADCs). To achieve both high-speed operation and low power consumption, the ADC architecture is reconfigured between binary search and flash every clock cycle, relying on the conversion delay. The proposed binary search/flash architecture reconfigurable ADC can be implemented with only a small modification to conventional binary search ADCs. By live configuring, the flash operation is adaptively performed when an excess delay is detected. DAFS not only significantly improves the power scaling but also compensates for transistor speed shifts due to process, voltage and temperature (PVT) variations. Therefore, DAFS can be used to improve the design margin of high-speed ADCs. A prototype subranging ADC fabricated in 65 nm CMOS technology operates up to 1220 MS/s and achieves an SNDR of 36.2 dB with a Nyquist input frequency. DAFS is active between 820-1220 MS/s and achieves peak power reduction of 30%, when compared with the power scaling when DAFS is disabled. A peak FoM of 85 fJ/conv. was obtained at 820 MS/s, which is nearly a twofold improvement over that of previously reported subranging ADCs.Index Terms-Binary search ADC, dynamic architecture and frequency scaling (DAFS), flash ADC, high-speed, power scaling, subranging ADC.
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf TM is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of largescale scientific machine learning training applications, driven by the MLCommons TM Association. We present the results from the first submission round including a diverse set of some of the world's largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization and communication scheduling enabling overall > 10× (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system's memory hierarchy and training convergence that underlines the importance of nearcompute storage. To overcome the data-parallel scalability challenge at large batch-sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O and network behaviour to parameterize extended roofline performance models in future rounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.