Given the growth of scientific literature on the web, particularly material science, acquiring data precisely from the literature has become more significant. Material information systems, or chemical information systems, play an essential role in discovering data, materials, or synthesis processes using the existing scientific literature. Processing and understanding the natural language of scientific literature is the backbone of these systems, which depend heavily on appropriate textual content. Appropriate textual content means a complete, meaningful sentence from a large chunk of textual content. The process of detecting the beginning and end of a sentence and extracting them as correct sentences is called sentence boundary extraction. The accurate extraction of sentence boundaries from PDF documents is essential for readability and natural language processing. Therefore, this study provides a comparative analysis of different tools for extracting PDF documents into text, which are available as Python libraries or packages and are widely used by the research community. The main objective is to find the most suitable technique among the available techniques that can correctly extract sentences from PDF files as text. The performance of the used techniques Pypdf2, Pdfminer.six, Pymupdf, Pdftotext, Tika, and Grobid is presented in terms of precision, recall, f-1 score, run time, and memory consumption. NLTK, Spacy, and Gensim Natural Language Processing (NLP) tools are used to identify sentence boundaries. Of all the techniques studied, the Grobid PDF extraction package using the NLP tool Spacy achieved the highest f-1 score of 93% and consumed the least amount of memory at 46.13 MegaBytes.
In this article, an ultra-wideband antenna has been presented for medical applications which have a resonant frequency of 10.35GHz and offers a bandwidth of 1400MHz with a return loss of -19db. The presented antenna is low-cost lightweight and can easily be integrated inside the circuit. As this antenna is designed for medical applications its size is compact and is fed utilizing a coaxial feeding technique which is especially plentiful for operative radiotherapy applications. The designed antenna is rectangular and covers an overall size of 24x12mm with a thickness of 1mm. The proposed antenna is designed using CST studio as a simulation tool, the extracted results of important parameters like return loss, surface current, reference impedance, and far-field have achieved remarkable results which are illustrated in the article. which makes it suitable for radiotherapy. A high epsilon valued ε r = 4.8, tan δ = 0.02 substrate GT-008 has been employed as a dielectric material whereas copper is being used for the ground and radiating patches. The performance of the antenna in the X-band is satisfactory which makes it suitable for radiotherapy.
Social media is a barometer to anticipate sentiment of the public about the state of affairs and ongoing pandemic engaged an additional user base who are confined to their stations. COVID-19 startled the world and the crisis exacerbates in the absence of sufficient data for policy making. The data from social media and a timely analysis can provide sufficient statistics for decision-making. This study explores Twitter data to discover knowledgeable statistics on public sentiments about COVID-19 vaccination in developing countries. The study inspects data collected from two extremely populated developing countries: India and Pakistan. Support Vector Machine (SVM) classifier achieves 74.3% accuracy on the manually labeled dataset. Furthermore, the sentiment analysis is correlated with other indigenous factors like regional literacy rate and COVID-19 calamities in the time interval. It is observed that the negative to positive sentiments correlates with a lower to higher regional literacy rate and a higher COVID-19 intensity causes positive sentiments towards vaccination. The correlations of results with indigenous factors may help to advocate the devised strategies to the right audience and social media knowledge discovery with machine learning techniques may help to recover from data scarcity challenges in a medical emergency like COVID-19 in developing countries. Please note: Abbreviations should be introduced at the first mention in the main text – no abbreviations lists. Suggested structure of main text (not enforced) is provided below.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.