The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the screening of the clustering labels, the paper improves STC algorithm, respectively perfecting the choice of the base cluster, the similarity calculation formula used to merge the base clusters and the scoring function for the clustering labels. Finally entropy is taken as the evaluation criterion for the clustering results. Compared with the original algorithm there are a better effect which is attested by experiments and more readability, descriptive and distinguishable clustering labels.