In this paper, we propose GT-GDA, a distributed optimization method to solve saddle point problems of the form: minx maxy F (x, y) := G(x) + y, P x − H(y) , where the functions G(•), H(•), and the the coupling matrix P are distributed over a strongly connected network of nodes. GT-GDA is a first-order method that uses gradient tracking to eliminate the dissimilarity caused by heterogeneous data distribution among the nodes. In the most general form, GT-GDA includes a consensus over the local coupling matrices to achieve the optimal (unique) saddle point, however, at the expense of increased communication.To avoid this, we propose a more efficient variant GT-GDA-Lite that does not incur the additional communication and analyze its convergence in various scenarios. We show that GT-GDA converges linearly to the unique saddle point solution when G is smooth and convex, H is smooth and strongly convex, and the global coupling matrix P has full column rank. We further characterize the regime under which GT-GDA exhibits a network topology-independent convergence behavior. We next show the linear convergence of GT-GDA-Lite to an error around the unique saddle point, which goes to zero when the coupling cost y, P x is common to all nodes, or when G and H are quadratic. Numerical experiments illustrate the convergence properties and importance of GT-GDA and GT-GDA-Lite for several applications.