Background:
Triple-negative breast cancer (TNBC) is a subtype of breast cancer proposed at the beginning of this century, which is still the most challenging breast cancer subtype due to its aggressive behavior, including early relapse, metastatic spread, and poor survival. This study explores current research status and deficiencies from a macro perspective on TNBC publications by using machine learning methods.
Methods:
All publications under the MeSH term "Triple Negative Breast Neoplasms" in PubMed were searched and downloaded as of December 2020. R and Python were used to extract MeSH terms, geographic information and other abstracts from metadata. The Latent Dirichlet Allocation algorithm was applied to identify specific research topics. The Louvain algorithm was used to establish a topic network, identifying the relationship between the topics.
Results:
A total of 5,097 publications were identified, with an average annual growth rate of 33.5%. Only 88 countries and regions in the world participated in TNBC research. The publications contain about 25% of clinical trial-related research. The rapid clinical progress will provide an excellent example for diagnosing and treating other tumors, suggesting that TNBC is an excellent forerunner in cancer research. Publications of pathogenic mechanisms and drugs are most studied. Using the Topic model, we found that the publications were mainly focused on three aspects: treatment plan research, new biomarkers research, and the regulation mechanism for TNBC aggressive behavior.
Conclusion:
This study quantitively summarizes the current status of TNBC research from a macro perspective and will aid in redirecting basic and clinical research toward a better outcome of TNBC.