Topic modeling is a method for finding abstract topics in a large collection of documents. With it, it is possible to discover the mixture of hidden or "latent" topics that varies from document to document in a given corpus. As an unsupervised machine learning approach, topic models are not easy to evaluate since there is no labelled "ground truth" data to compare with. However, since topic modeling typically requires defining some parameters beforehand (first and foremost the number of topics k to be discovered), model evaluation is crucial in order to find an "optimal" set of parameters for the given data. Latent Dirichlet allocation (LDA) and Non-Negative Matrix Factorization (NMF) are the two most popular topic modeling techniques. LDA uses a probabilistic approach where as NMF uses matrix factorization approach. In this paper we want to assess which most relevant technique for topic coherence using c_v measure, we have chosen citations's Covid'19 Corpus for experimentations.
The term "scientific publication" includes several types of scientific communications and digital broadcasts that scientific researchers make of their work towards their peers and an audience of specialists. These publications describe in detail the studies or experiments carried out and the conclusions drawn from them by the authors. They undergo an examination of the value of the results and the rigor of the scientific method used for the work carried out. In this paper we evaluated the quality of a scientific article on the subject (topic), based on its citations and where is it cited, we based on the Topic modeling theme with the choice of LDA algorithms applied to the corpus Nips for detecting all subjects of each paper and there citations in a first step then on the citations of each article of the corpus and on the Sentiment Analysis using a lexical based approaches. Then we created a csv file containing the link of each paper with the other cited papers (relation cited-citing), and finally generated a semantic graph between these publications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.