Brain functional connectivity under the naturalistic paradigm has been illustrated to be better at predicting individual behaviors than other brain states, such as rest and task. Nevertheless, the state-of-the-art methods are difficult to achieve desirable results from movie-watching paradigm fMRI (mfMRI) induced brain functional connectivity, especially when the datasets are fewer. Incorporating other physical measurements into the prediction method may enhance accuracy. Eye tracking, becoming popular due to its portability and less expense, can provide abundant behavioral features related to the output of human's cognition, and thus might supplement the mfMRI in observing subjects’ subconscious behaviors. However, there are very few works on how to effectively integrate the multimodal information to strengthen the performance by a unified framework. To this end, a fusion approach with mfMRI and eye tracking, based on Convolution with Edge-Node Switching in Graph Neural Networks (CensNet), is proposed in this article, with subjects taken as nodes, mfMRI derived functional connectivity as node feature, different eye tracking features used to compute similarity between subjects to construct heterogeneous graph edges. By taking multiple graphs as different channels, we introduce squeeze-and-excitation attention module to CensNet (A-CensNet) to integrate graph embeddings from multiple channels into one. The experiments demonstrate that the proposed model outperforms the one using single modality, single channel and state-of-the-art methods. The results suggest that brain functional activities and eye behaviors might complement each other in interpreting trait-like phenotypes.