“…Deep learning has become effective for AD modeling because of its capability to capture complex structures, extract end-to-end automatic features, and scale for large data sets [ 1 , 2 ]. Several DL models have been proposed in the literature for diverse data types, such as structural [ 1 ], time series [ 7 , 8 , 9 , 12 , 13 , 16 , 27 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 ], image [ 10 , 26 ], graph network data [ 14 , 15 , 24 , 25 , 39 ], and spatio-temporal [ 10 , 14 , 15 , 17 , 18 , 19 , 20 , 21 , 22 , 24 , 25 , 39 ]. Spatio-temporal (ST) data are commonly collected in diverse domains, such as visual streaming data [ 17 , 18 , 19 , 20 , 21 , 22 , 23 ], transportation traffic flows [ 24 , 25 ], sensor networks [ 14 , 15 , 39 ], geoscience […”