Online unsupervised spike sorting or clustering is an integral component of implantable closed-loop brain-computer-interface systems. Robust clustering performance against various nonidealities such as poor initialization and order-of-arrival of inputs are desirable while meeting the minimal area and power requirements for implants. We explore an online and unsupervised spike-sorting algorithm utilizing a low-overhead feature screening process that improves feature discriminability in the use of suboptimal features for reducing hardware complexity. Based on the algorithm, an accelerator architecture that performs feature screening and clustering is devised and implemented in a 65-nm high-V TH CMOS, largely improving clustering accuracy even with poor clustering initialization. In the post-layout static timing and power simulation, the power consumption and the area of the accelerator are found to be 2.17 μW/ch and 0.052 μm 2 /ch, respectively, which are 53% and 25% smaller than the previous designs, while achieving the required throughput of 420 sorting/s at the supply voltage of 300mV.