Abstract-Algorithms that make use of the gradient, i.e. the direction of maximum improvement, to search for the optimum of a single objective function have been around for decades. They are commonly accepted to be important and have been applied to tackle single-objective optimization problems in many fields. For multi-objective optimization problems, much less is known about the gradient and its algorithmic use. In this article we aim to contribute to the understanding of gradients for numerical, i.e. real-valued, multi-objective optimization. Specifically, we provide an analytical parametric description of the set of all nondominated, i.e. most promising, directions in which a solution can be moved such that the objective values either improve or remain the same. This result completes previous work where this set is described only for one particular case, namely when some of the non-dominated directions have positive, i.e. non-improving, components and the final set can be computed by taking the subset of directions that are all non-positive. In addition we use our result to assess the utility of using gradient information for multi-objective optimization where the goal is to obtain a Pareto set of solutions that approximates the optimal Pareto set. To this end, we design and consider various multi-objective gradientbased optimization algorithms. One of these algorithms uses the description of the multi-objective gradient provided here. Also, we hybridize an existing multi-objective evolutionary algorithm (MOEA) with the various multi-objective gradient-based optimization algorithms. During optimization, the performance of the gradient-based optimization algorithms is monitored and the available computational resources are redistributed to allow the (currently) most effective algorithm to spend the most resources. We perform an experimental analysis using a few well-known benchmark problems to compare the performance of different optimization methods. The results underline that the use of a population of solutions that is characteristic of MOEAs is a powerful concept if the goal is to obtain a good Pareto set, i.e. instead of only a single solution. This makes it hard to increase convergence speed in the initial phase using gradient information to improve any single solution. However, in the longer run, the use of gradient information does ultimately allow for better finetuning of the results and thus better overall convergence.