Sixteen animal-based indicators of sheep welfare, previously selected by a stakeholder panel, and based on the Farm Animal Welfare Council (FAWC) Five Freedoms, were assessed in terms of the level of inter-observer agreement achieved during on-farm testing. Eight observers independently tested the 16 indicators on 1158 sheep from 38 farms in England and Wales. Overall inter-observer agreement was evaluated by Fleiss’s kappa (κ), and the pair-wise agreement of each observer was compared to a ‘test standard’ observer (TSO). Inter-observer assessments of the welfare indicators; dental abnormality, cleanliness score (ventral abdomen), mastitis, tail length, skin lesions, body condition scoring and lameness produced ‘fair to good’ levels of agreement (0.40 < κ < 0.75) and joint swellings had ‘excellent’ levels of agreement (κ ≥ 0.75). The very low apparent prevalence (<0.8%) of sheep with specific outcomes such as pruritis, wool loss, myiasis, thin body condition, diffuse or severe skin lesions limited kappa analysis for these indicators. Overall, findings suggest that observers of differing experience, training and occupation were reliable in assessing key animal-based indicators of sheep health and welfare.