“…The initial design operation is to multiply the two features to obtain a matrix, then sum pool the matrix to obtain the feature vector, and then use this vector to classify, but it usually suffers from as a high computational complexity as
. In recent years, an effective attention‐based fusion method has been developed by extending transformer [
22–27]. The self‐attention mechanism in Transformer can be regarded as information fusion on a fully‐connected graph, which is more general to model the input data.…”