The increasing prevalence of depression underscores the critical need for improved monitoring and personalized treatment options. While traditional assessment methods exhibit limitations, smartphone-based ecological momentary assessment has demonstrated validity and reliability in capturing real-time experiences. However, challenges such as time consumption and perceived invasiveness remain. To address these limitations, our research explored the potential of deep learning and wearable sensors to classify self-reported affective symptoms such as valence, arousal, and sleepiness based on objective and continuous passive data. Our study spanned 35 days, incorporating a diverse cohort of 26 participants (14 female, age 29 $\pm$9.0), including sixteen depressed patients and ten healthy controls. We used deep learning techniques to classify high-quality physiological data collected from a wearable patch, combining electrocardiogram signals, raw accelerometer data, and respiration rates into 3-levels of affective states. We identified optimal time windows for prediction — 24 hours for valence and 12 hours for arousal and sleepiness - and showed that combining longitudinal heart rate and heart rate variability metrics with physical activity enhanced the predictive performance for affective states classification compared to individual modalities. Our models achieved notable classification metrics for the 3-level affective states, with a balanced accuracy of 0.65 for valence, 0.56 for arousal and 0.53 for sleepiness, demonstrating competitive performance with previous work. This research contributes to advancing mental health monitoring practices, providing valuable insights into the relationships between affective and physiological states.