Complex graphs are at the heart of today's big data challenges like recommendation systems, customer behavior modeling, or incident detection systems. One reoccurring task in these fields is the extraction of network motifs, reoccurring and statistically significant subgraphs. In this work we propose a precisely tailored embedded architecture for computing similarities based on one special network motif, the co-occurrence. It is based on efficient and scalable building blocks that exploit well-tuned algorithmic refinements and an optimized graph data representation approach. On chip, our solution features a customized cache design and a lightweight data path that allows the system to perform over 10,000 graph operations per cycle on each chip. We provide detailed area, energy, and timing results for a 28 nm ASIC process and DDR3 memory devices. Compared to an Intel cluster, our proposed solution uses 44x less memory and is 224x more energy efficient.