The effect of the harmony of the audio clip and the video clip in an audiovisual material on the estimation of its scores is experimentally clarified for the cross-media retrieval. In the experiment, four patterns of audiovisual materials are used. These are the statically matched or mismatched, and the dynamically matched or mismatched audiovisual clips. This experiment clarifies the degree of the contribution of the harmony of audiovisual clips to their scores. It also clarifies that the degree of total harmony of the audiovisual material could be estimated by using the degree of the static harmony and that of the dynamic harmony.