The aim of this work is to present an automated method, working in real time, for human activity recognition based on acceleration and first-person camera data. A Long-Short-Term-Memory (LSTM) model has been built for recognizing locomotive activities (i.e. walking, sitting, standing, going upstairs, going downstairs) from acceleration data, while a ResNet model is employed for the recognition of stationary activities (i.e. eating, reading, writing, watching TV working on PC). The outcome of the two models is fused in order for the final decision, regarding the performed activity, to be made. For the training, testing and evaluation of the proposed models, a publicly available dataset and an "in-house" dataset are utilized. The overall accuracy of the proposed algorithmic pipeline reaches 87.8%.