Topology Optimization (TO) is the process of finding the optimal arrangement of materials within a design domain by minimizing a cost function, subject to some performance constraints. Robust topology optimization (RTO), as a class of TO problems, also incorporates the effect of input uncertainties, such as uncertainty in loading, boundary conditions, and material properties, and produces a design with the best average performance of the structure while reducing the response sensitivity to input uncertainties. It is computationally expensive to carry out RTO using finite element solvers due to the high dimensional nature of the design space which necessitates multiple iterations and the need to evaluate the probabilistic response using numerous samples. In this work, we use neural network surrogates to enable a faster solution approach via surrogate-based optimization. In particular, we build a Variational Autoencoder (VAE) to transform the the high dimensional design space into a low dimensional one, thus making the design space exploration more efficient. Furthermore, finite element solvers will be replaced by a neural network surrogate that predicts the probabilistic objective function. Also, to further facilitate the design exploration, we limit our search to a subspace, which consists of designs that are solutions to deterministic topology optimization problems under different realizations of input uncertainties. With these neural network approximations, a gradient-based optimization approach is formed to minimize the predicted objective function over the low dimensional design subspace. We demonstrate the effectiveness of the proposed approach on two compliance minimization problems: (1) the design of an L-bracket structure with uncertainty in loading angle, and (2) the design of a heat sink with varying load magnitude in the heat sources. Through these examples, we show that VAE performs well on learning the features of the design from minimal training data. Further, it also shows that converting the design space into a very low dimensional latent space makes the problem computationally efficient, and that the resulting gradient-based optimization algorithm produces optimal designs with lower robust compliances than those observed in the training set.