This work demonstrates a RISC-V vector microprocessor implemented in 28nm FDSOI with fully-integrated noninterleaved switched-capacitor DCDC (SC-DCDC) converters and adaptive clocking that generates four on-chip voltages between 0.5V and 1V using only 1.0V core and 1.8V IO voltage inputs. The design pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices. Introduction Optimal energy efficiency requires tight integration of the power supply control with the microprocessor. Alternatives to high latency off-chip regulators are integrated low-drop-out (LDO) regulators [1], buck converters with off-chip inductors [2], and SC-DCDC converters [3]. Traditional interleaved SC-DCDC converters stabilize the output voltage to minimize frequency margining for supply variation, but in principle, efficiency could be increased by using a non-interleaved design that avoids charge sharing [4]. In the non-interleaved operation, SC-DCDC unit cells switch simultaneously to avoid charge sharing losses, and an adaptive clock translates a higher instantaneous voltage into a higher frequency to exploit the rippling supply voltage.Integrated System Implementation Figure 1 shows the chip architecture. The 64-bit singleissue in-order scalar core implements the open-source RISC-V instruction set [5]. The scalar core has a memory management unit that supports page-based virtual memory, an IEEE 754-2008-compliant floating-point unit, a high-performance 64-bit vector accelerator with vector gather and scatter support, and L1 caches. The processor boots Linux and executes both compiled scalar and vector code with single-and doubleprecision floating-point operations, including fused multiplyadd. Two voltages, a 1.0V core and 1.8V I/O, are supplied to the on-chip converters. The SC-DCDC converter is partitioned into twenty-four 90µm x 90µm unit cells surrounding the core (16% area overhead), and generates four dynamically reconfigurable average output voltages of 1.0V, 0.9V, 0.67V, and 0.5V. An adaptive clock generator adjusts the clock period each cycle based on the instantaneous converter output voltage. Level shifters and asynchronous FIFOs separate the core and uncore voltage domains. Large random variations in SRAM memory cells typically limit voltage scaling, so custom SRAMs were implemented to enable voltage scaling down to 0.45V. Each 4KB SRAM uses 8T cells and has 512 words of 72 bits with 2:1 interleaving. The adaptive clocking scheme, shown in Figure 3, ensures that the system operates at the maximum instantaneous frequency. The rippling supply voltage powers a tunable replica circuit (TRC), adjustable from 4 to 124 FO1 inverter delays, to mimic the critical path delay at the instantaneous voltage level. When the TRC generates a pulse, the controller selects one of the sixteen DLL phases to send to the core as a clock edge. Figure ...