In the dynamic realm of big data processing, conquering the challenges imposed by highdimensional datasets is imperative. This paper introduces a groundbreaking advancement in dimensionality reduction, employing Variational Auto-Encoder (VAE) within the Spark distributed framework. The deliberate selection of the "TLC" dataset, representative of New York City taxi trips with inherent high dimensionality, highlights the practicality of our approach. Our research showcases the virtuoso performance of VAE, achieving an impressive 95.12% reduction ratio and 89.26% accuracy. This highlights VAE's ability to elegantly distill essential information while discarding superfluous dimensions, achieving a harmonious balance between reduction and accuracy. Furthermore, building on the demonstrated superiority of Spark over Hadoop in prior successes, our adoption of VAE aligns with the overarching goal of enhancing big data processing. Spark's consistent advantage as a distributed framework reaffirms its reliability in handling diverse machine learning algorithms. This paper not only contributes to the advancement of machine learning in big data processing but also underscores the adaptability, versatility, and consistent performance of our approach across various methodologies and frameworks. The success of VAE in reducing dimensionality, coupled with Spark's inherent advantages, positions this research as a valuable contribution to the exploration of advanced techniques in distributed big data processing.