Keyword extraction by Term frequency-Inverse document frequency (TF-IDF) is used for text information retrieval and mining in many domains, such as news text, social contact text, and medical text. However, keyword extraction in special domains still needs to be improved and optimized, particularly in the scientific research field. The traditional TF-IDF algorithm considers only the word frequency in documents, but not the domain characteristics. Therefore, we propose the Scientific research project TF-IDF (SRP-TF-IDF) model, which combines TF-IDF with a weight balance algorithm designed to recalculate candidate keywords. We have implemented the SRP-TF-IDF model and verified that our method has better precision, recall, and F1 score than the traditional TF-IDF and TextRank methods. In addition, we investigated the parameter of our weight balance algorithm to find an optimal value for keyword extraction from scientific research projects.
Streptococcus parasuis (S. parasuis) is a close relative of Streptococcus suis (S. suis), composed of former members of S. suis serotypes 20, 22 and 26. S. parasuis could infect pigs and cows, and recently, human infection cases have been reported, making S. parasuis a potential opportunistic zoonotic pathogen. In this study, we analysed the genomic characteristics of S. parasuis, using pan-genome analysis, and compare some phenotypic determinants such as capsular polysaccharide, integrative conjugative elements, CRISPR-Cas system and pili, and predicted the potential virulence genes by associated analysis of the clinical condition of isolated source animals and genotypes. Furthermore, to discuss the relationship with S. suis, we compared these characteristics of S. parasuis with those of S. suis. We found that the characteristics of S. parasuis are similar to those of S. suis, both of them have “open” pan-genome, their antimicrobial resistance gene profiles are similar and a srtF pilus cluster of S. suis was identified in S. parasuis genome. But S. parasuis still have its unique characteristics, two novel pilus clusters are and three different type CRISPR-Cas system were found. Therefore, this study provides novel insights into the interspecific and intraspecific genetic characteristics of S. parasuis, which can be useful for further study of this opportunistic pathogen, such as serotyping, diagnostics, vaccine development, and study of the pathogenesis mechanism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.