Dynamic and pointer-based data structures are widely used in symbolic or irregular C codes. Howevel; there is still a lack of compiler techniques to deal with the automatic optimization of such codes. In this paper we take a first step towards this final objective: the automatic identification of the data structure used in the code. More precisely, we describe the framework and the compiler we have implemented to capture complex data structures generated, traversed, and modified in C codes. Our method assigns a Reduced Set of Reference Shape Graphs (RSRSG) to each sentence to approximate the shape of the data structure after the execution of such a sentence. With the properties and operations that define the behavior of our RSRSG, the method can accurately detect complex recursive data structures. The compiler makes a progressive analysis in which the level of detail is increased during the analysis when needed. Several experiments are carried out with complex data structures to validate the capabilities of our compiler.