In numerous solutions, the distributed stochastic gradient descent algorithm is one of the most popular algorithms for parallelization matrix decomposition. However, in parallel calculation, the computing speed of each computing node was greatly different because of the imbalance of the computing nodes. This article reduced the data skew for all computing nodes during distributed execution to solve the problem of locking waiting. The improved algorithm on DSGD was named D-DSGD, which reduced the time consumption of the algorithm and improved the utilization rate of the nodes. Meanwhile, the dynamic step size adjusting strategy was applied to improve the convergence rate of the algorithm. To ensure the non-negative matrix decomposition, non-negative control was added into D-DSGD and the improved algorithm was named D-NMF. Compared with the existing methods, the proposed algorithm in this article has a marked impact on reducing the latency and speed of convergence.