In recent years, deep learning techniques especially convolutional neural networks (CNNs), have shown significant progress in facial expression recognition. However, most prior studies are susceptible to the “fuzzy phenomena” in FER datasets, which is a common phenomenon in real-world scenarios. In addition, traditional CNNs are usually sensitive to rotations and pose variations in images, which is a common challenge in facial expression recognition because expressions can occur at different head postures. To address the above-mentioned problems, we propose a Spatial-channel Capsule Aggregated Network with Dynamic data Cleansing Module (SCAN-DCM). Specifically, we incorporate spatial and channel attention mechanism into vanilla capsule network for better capturing the relative position relationships and orientation features among the facial Action Units (AUs), enabling the model to adaptively perceive the facial AUs with high feature information density. Furthermore, the Dynamic data Cleansing Module (DCM) is proposed to address the “fuzzy phenomena” via enhancing the feature expressions of samples with high Contribution Coefficients (CC), suppressing those with low CC, and curing the wrongly labeled samples in the low CC. Extensive experiments are conducted on the FER datasets RAF-DB as well as FERPlus in natural scenes.SCAN-DCM achieves an accuracy of 82.83% on FERPlus and 86.28% on RAF-DB, which shows that our approach has remarkable performance.