Figure 1: Comparison of traditional text-based image retrieval, conventional SBIR, and the proposed fine-grained SBIR framework.Introduction Sketches are known to be able to capture object appearance and structure more intuitively and precisely than bare texts. However, to date the main focus of sketch-based image retrieval (SBIR) has been on retrieving photos of the same category, overlooking an important property of sketches -they can capture fine-grained variations of objects such as pose (standing vs. sitting) and iconic pattern (textures on a cow's body). By further leveraging this descriptive power of sketches, in this paper, for the first time we introduce fine-grained SBIR. That is to study how sketches can be used to differentiate fine-grained variations of objects for retrieval, specifically pose variations. Figure 1 contrasts text-based image retrieval and conventional SBIR with our proposed fine-grained SBIR. Methodology Key to this problem is introducing a mid-level sketch representation that not only captures object pose, but also possesses the ability to traverse sketch and photo domains. Specifically, we learn deformable part-based model (DPM) [3] to discover and encode the various poses and parts in sketch and image domains independently, and employ graph matching [1] to establishing the correspondence between DPMs from different domains. The DPM is a two-layer structure, composed of root filter and part filters. We denote DPM as M = (r, G), where r = (w, h, f ) specifies the width w, height h and global appearance feature of the root filter; and G = (V, E, A) represents the star graph composed of the part filters. For the star graph G, V represents a set of nodes, E, edges, and A, attributes. Our matching objective for DPM accounts for both appearance and geometric information encoded in DPM, as well as both layers of representation, i.e., root filter r and part filter star graph G. Given two DPMs M R and M T , the similarity function is defined as: