Background: Knowledge graphs (KGs) can introduce domain knowledge into the traditional Chinese medicine (TCM) intelligent syndrome differentiation model. However, the construction quality of current KGs in the TCM field is uneven, which is related to lacking knowledge graph completion (KGC) and evaluation methods.Objective: To explore KG completion methods and evaluation methods suitable for TCM domain knowledge.Methods: In the KGC phase, according to TCM domain knowledge characteristics, we propose the three-step "entity-ontologypath" completion plan, using path reasoning, ontology rule reasoning and association rules. In the KGC quality evaluation phase, we propose a three-dimensional evaluation system of completeness, accuracy, and usability using quantitative indicators such as complex network analysis, ontology reasoning, and graph representation. Furthermore, we discuss the influence of different graph representation models on KG usability.
Results:In the KGC phase, 52, 107, 27, and 479 triples were added by outlier analysis, rule-based reasoning, association rules, and path-based reasoning, respectively. In addition, rule-based reasoning identified 14 contradictory triples. In the KGC quality evaluation phase, in terms of completeness, KG after completion had higher density and lower sparsity, and there were no contradictory rules in the KG. In terms of accuracy, KG after completion was more consistent with prior knowledge. In terms of usability, the MRR, MR, and Hist@N of the TransE, RotatE, DistMult, and ComplEx graph representation models all showed improvement after KG completion. Among them, the RotatE model achieved the best representation.
Conclusions:The three-step completion plan can effectively improve the completeness, accuracy and availability of KGs, and the three-dimensional evaluation system can be used for comprehensive KGC evaluation. In the TCM field, the RotatE model performed better in KG representation.