“…Unfortunately, to our best knowledge, there is a lack of systematic evaluations on the predictive performance of existing methods on loop regions, and no benchmark datasets containing large and diverse data have been made available thus far. Existing datasets suffer from three key shortcomings: (1) they require updating [ 66 ], with most test datasets proposed over a decade ago [ 47 , 67 , 68 ], (2) longer loops, especially those exceeding 15 residues, are often ignored [ 69 , 70 ] and (3) the data coverage and volume are limited, consisting of only ~100 samples and a few protein types [ 42 , 52 , 71 ]. Therefore, evaluations based on these datasets may not adequately reflect actual model performance.…”