In a typical multi-person social interaction, spatial information plays an important role in analyzing the structure of the social interaction. Previous studies, which analyze spatial structure of the social interaction using one or more third-person view cameras, suffer from the occlusion problem. With the increasing popularity of wearable computing devices, we are now able to obtain natural first-person observations with limited occlusion. However, such observations have a limited field of view, and can only capture a portion of the social interaction. To overcome the aforementioned limitation, we propose a search-based structure recovery method in a small group conversational social interaction scenario to reconstruct the social interaction structure from multiple first-person views, where each of them contributes to the multifaceted understanding of the social interaction. We first map each first-person view to a local coordinate system, then a set of constraints and spatial relationships are extracted from these local coordinate systems. Finally, the human spatial configuration is searched under the constraints to "best" match the extracted relationships. The proposed method is much simpler than full 3D reconstruction, and suffices for capturing the social interaction spatial structure. Experiments for both simulated and real-world data show the efficacy of the proposed method.