Mass spectrometry data preprocessing is the most basic and important part of mass spectrometry data application. Aiming at the problems of high dimension of serum mass spectrometry data, an efficient and usable preprocessing method is proposed which deals with the inconsistency of dimensions in each group, non-normalization, and non-unique attribute columns. Firstly, the method of rounding down the Mass value is used to determine the attribute column name; Secondly, four ways are used to determine the unique value for the problem that the existence of the same Mass term after rounding down the Mass value; then four ways of filling the missing values are done, and then the transposition of the overall data is carried out. Since four treatments are done for both the same Mass term and missing values, then 16 permutations are available, with a total of 16 new datasets; Finally, three machine learning methods are used to demonstrate the performances of different data preprocessing approaches on the datasets, which are SVM(linear), Random Forest and Logistic Regression. The numerical results show that the accuracies of classification of the three classifiers respectively reach 0.8296, 0.8906, and 0.8088, which validate that the preprocessing algorithms proposed in this paper can efficiently process the raw mass spectrometry data into valid and normalized data.