White matter hyperintensities of presumed vascular origin (WMH) are frequently found in MRIs of patients with various neurological and vascular disorders, but also in healthy elderly subjects. Although automated methods have been developed to replace the challenging task of manually segmenting the WMH, there is still no consensus on which validated algorithm(s) should be used. In this study, we validated and compared three freely available methods for WMH extraction: FreeSurfer, UBO Detector, and the Brain Intensity AbNormality Classification Algorithm, BIANCA (with the two thresholding options: global thresholding vs. LOCally Adaptive Threshold Estimation (LOCATE)) using a standardized protocol. We applied the algorithms to longitudinal MRI data (2D FLAIR, 3D FLAIR, T1w sMRI) of cognitively healthy older people (baseline N = 231, age range 64 - 87 years) with a relatively low WMH load. As a reference for the segmentation accuracy of the algorithms, completely manually segmented gold standards were used separately for each MR image modality. To validate the algorithms, we correlated the automatically extracted WMH volumes with the Fazekas scores, chronological age, and between the time points. In addition, we analyzed conspicuous percentage WMH volume increases and decreases in the longitudinal data between two measurement points to verify the segmentation reliability of the algorithms. All algorithms showed a moderate correlation with chronological age except BIANCA with the 2D FLAIR image input only showed a weak correlation. FreeSurfer fundamentally underestimated the WMH volume in comparison with the gold standard as well as with the other algorithms, and cannot be considered as an accurate substitute for manual segmentation, as it also scored the lowest value in the DSC compared to the other algorithms. However, its WMH volumes correlated strongly with the Fazekas scores and showed no conspicuous WMH volume increases and decreases between measurement points in the longitudinal data. BIANCA performed well with respect to the accuracy metrics - especially the DSC, H95, and DER. However, the correlations of the WMH volumes with the Fazekas scores compared to the other algorithms were weaker. Further, we identified a significant amount of outlier WMH volumes in the within-person change trajectories with BIANCA. The WMH volumes extracted by UBO Detector achieved the best result in terms of cost-benefit ratio in our study. Although there is room for optimization with respect to segmentation accuracy (especially for the metrics DSC, H95 and DER), it achieved the highest correlations with the Fazekas scores and the highest ICCs. Its performance was high for both input modalities, although it relies on a built-in single-modality training dataset, and it showed reliable WMH volume estimations across measurement points.