The Semantic MEDLINE Database (SemMedDB) has limited performance in identifying entities and relations, while also neglects variations in argument quality, especially persuasive strength across different sentences. The present study aims to utilize large language models (LLMs) to evaluate the contextual argument quality of triples in SemMedDB to improve the understanding of disease mechanisms. Using argument mining methods, we first design a quality evaluation framework across four major dimensions, triples' accuracy, triple-sentence correlation, research object, and evidence cogency, to evaluate the argument quality of the triple-based claim according to their contextual sentences. Then we choose a sample of 66 triple-sentence pairs for repeated annotations and framework optimization. As a result, the predicted performances of GPT-3.5 and GPT-4 are excellent with an accuracy up to 0.90 in the complex cogency evaluation task. The tentative case evaluating whether there exists an association between gestational diabetes and periodontitis reveals accurate predictions (GPT-4, accuracy, 0.88). LLMs-enabled argument quality evaluation is promising for evidence integration in understanding disease mechanisms, especially how evidence in two stances with varying levels of cogency evolves over time.