Over the last few years applications based on the use of immersive environments, where physical and digital objects coexist and interact, have gained widespread attention. Thanks to the development of new visualization devices, even at low cost, and increasingly effective rendering and processing techniques, these applications are reaching a growing number of users. While the adoption of digital information makes it possible to provide immersive experiences in a number of different applications, there are still many unexplored aspects. In this work, a preliminary step to understand the impact of the scene content on human perception of the virtual 3D elements in a mixed reality has been performed. To this aim, a subjective test was designed and implemented to collect the reaction time of a set of users in a mixed reality application. In this test each user was asked to wear an augmented reality headset and to catch a virtual objects randomly appearing in the subject's field of view. We first estimated the detection accuracy through omitted, anticipated, and completed responses; then we related stimulus location, scene content and estimated accuracy. For this purpose, the area of stimulus presentation was divided into upper, lower, right, left, inner, and outer, to understand in which area responses were omitted and anticipated with respect to the central point of view. Experimental results show that, in addition to the saliency of the real scene, natural body gesture technology and limited field of view influenced human reaction time.