The vocabularies of natural languages harbour many instances of iconicity, where words show a perceived resemblance between aspects of form and meaning. An open challenge in this domain is how to reconcile different operationalizations of iconicity and link them to an empirically grounded theory. Here we combine three ways of looking at iconicity using a set of 239 iconic words from 5 spoken languages (Japanese, Korean, Semai, Siwu and Ewe). Data on guessing accuracy serves as a baseline measure of probable iconicity and provides variation that we seek to explain and predict using structure-mapping theory and iconicity ratings. We systematically trace a range of cross-linguistically attested form-meaning correspondences in the dataset, yielding a word-level measure of cumulative iconicity that we find to be highly predictive of guessing accuracy. In a rating study, we collect iconicity judgments for all words from 80 participants. The ratings are well-predicted by our measure of cumulative iconicity and also correlate strongly with guessing accuracy, showing that rating tasks offer a scalable method to measure iconicity. Triangulating the measures reveals how structure-mapping can help open the black box of experimental measures of iconicity. We explore reasons for convergence and divergence and bring to light possible experimental confounds in guessing experiments as well as subjective factors in coding and rating approaches. While none of the methods is perfect, taken together they provide a well-rounded way to approach the meaning and measurement of iconicity in natural language vocabulary.