Deep learning-based multi-task approaches usually rely on factorizing representation layers up to a certain point, where the network splits into several heads, each one addressing a specific task. Depending on the inter-task correlation, such naive model may or may not allow the tasks to benefit from each others. In this paper, we propose a novel Semantic Orthogonality Spaces (SOS) method for multi-task problems, where each task is predicted using the information from a common subspace that factorizes information among all tasks, as well as a task-specific subspace. We enforce orthogonality between these tasks by applying soft orthogonality constraints, as well as adversarially-learned semantic orthogonality objectives that ensures that predicting one task requires the specific information related to that task. We demonstrate the effectiveness of SOS on synthetic data, as well as for large-scale facial attributes prediction. In particular, we use SOS to craft a lightweight architecture that provides high-end accuracies on CelebA database.
Face based affective computing consists in detecting emotions from face images. It is useful to unlock better automatic comprehension of human behaviours and could pave the way toward improved human-machines interactions. However it comes with the challenging task of designing a computational representation of emotions. So far, emotions have been represented either continuously in the 2D Valence/Arousal (VA) space or in a discrete manner with Ekman's 7 basic emotions (FER). Alternatively, Ekman's Facial Action Unit (AU) system have also been used to caracterize emotions using a codebook of unitary muscular activations. ABAW3 and ABAW4 Multi-Task Challenges are the first work to provide a large scale database annotated with those three types of labels. In this paper we present a transformer based multi-task method for jointly learning to predict valence arousal, action units and basic emotions. From an architectural standpoint our method uses a taskwise token approach to efficiently model the similarities between the tasks. From a learning point of view we use an uncertainty weighted loss for modelling the difference of stochasticity between the three tasks annotations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.