This paper describes a multi-functional deep in-memory processor for inference applications. Deep inmemory processing is achieved by embedding pitch-matched low-SNR analog processing into a standard 6T 16KB SRAM array in 65 nm CMOS. Four applications are demonstrated. The prototype achieves up to 5.6X (9.7X estimated for multi-bank scenario) energy savings with negligible (≤1%) accuracy degradation in all four applications as compared to the conventional architecture.
2Emerging inference applications require processing of huge data volumes [1]. A conventional inference architecture ( Fig. 1) implements memory access, data transfer from memory to processor, data aggregation, and slicing. In such architectures, memory access energy dominates, e.g., an 8-b SRAM read access and an 8-b MAC consumes 5pJ and 1pJ in 65nm CMOS, respectively. Additionally, the memoryprocessor interface presents a severe throughput bottleneck. Deep in-memory signal processing concept was proposed in [2] to overcome these challenges by embedding mixed-signal processing in the periphery of the SRAM bit-cell array (BCA). However, an IC implementation needs to address a host of new challenges including the stringent row & column pitch-matching requirements imposed by the BCA without altering its storage density or its read/write functionality, and enabling multiple functions with mixed signal circuitry. Recently [3], a single function, 5×1-b in-memory classifier IC has been demonstrated.The proposed deep in-memory inference architecture has four stages ( Fig.1): 1) multi-row functional read (MR-FR), 2) bit-line (BL) processing (BLP), 3) cross BL processing (CBLP), and 4) ADC and slicing. The MR-FR accesses multiple rows in one pre-charge cycle using pulse-width modulated word-line (PWM-WL) signals to generate a BL voltage drop proportional to a weighted sum of multiple bits stored in multiple rows in the column, and also performs word-level add/subtract. The BLP implements reconfigurable column pitch-matched mixed-signal circuits to execute computations such as multiply/absolute value/comparison on the BL voltages, in a massively column-parallel fashion. The CBLP aggregates the BLP outputs into a scalar which is sliced to obtain the final decision. The BLP and CBLP can be reconfigured to operate the architecture in either a dot product (DP) mode or Manhattan distance (MD) mode. Reconfigurable stages enable multiple functions (Fig. 1 table) including normal read/write. The chip architecture (Fig. 2) includes a digital controller (CTRL) and a CORE. The normal Technology Die size CTRL operating freq. SRAM capacity Bitcell dimension Supply voltage SVM 963.1 Matched filter 481.5 KNN 33.6K Template matching 33.6K SVM 1.7M Matched filter 3.4M KNN 54.3K Template matching 54.3K Energy per decision (pJ) Decision Throughput (Decisions/s) 65 nm CMOS 1.2 mm × 1.2 mm 1 GHz 16 KB (1 bank of 512 × 256-b) CORE: 1.0 V, CTRL: 0.85 V 2.11 × 0.92 um 2
Structural changes in the choroid, a layer located between the retina and sclera, could indicate various vision impairments. Consequently, ophthalmologists inspect optical coherence tomography (OCT) scans of the posterior section of the eye towards making diagnosis. With a view to assist diagnosis, we propose an automated technique for segmentation of the choroid layer. Specifically, we detect the upper and lower boundaries of the choroid using structural similarity and adaptive Hessian analysis. Subsequently, we detect choroid vessels within those boundaries using a level set method. Experimental results are presented using spectral domain (SD) OCT images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.