Background As the COVID-19 pandemic progressed, disinformation, fake news, and conspiracy theories spread through many parts of society. However, the disinformation spreading through social media is, according to the literature, one of the causes of increased COVID-19 vaccine hesitancy. In this context, the analysis of social media posts is particularly important, but the large amount of data exchanged on social media platforms requires specific methods. This is why machine learning and natural language processing models are increasingly applied to social media data. Objective The aim of this study is to examine the capability of the CamemBERT French-language model to faithfully predict the elaborated categories, with the knowledge that tweets about vaccination are often ambiguous, sarcastic, or irrelevant to the studied topic. Methods A total of 901,908 unique French-language tweets related to vaccination published between July 12, 2021, and August 11, 2021, were extracted using Twitter’s application programming interface (version 2; Twitter Inc). Approximately 2000 randomly selected tweets were labeled with 2 types of categorizations: (1) arguments for (pros) or against (cons) vaccination (health measures included) and (2) type of content (scientific, political, social, or vaccination status). The CamemBERT model was fine-tuned and tested for the classification of French-language tweets. The model’s performance was assessed by computing the F1-score, and confusion matrices were obtained. Results The accuracy of the applied machine learning reached up to 70.6% for the first classification (pro and con tweets) and up to 90% for the second classification (scientific and political tweets). Furthermore, a tweet was 1.86 times more likely to be incorrectly classified by the model if it contained fewer than 170 characters (odds ratio 1.86; 95% CI 1.20-2.86). Conclusions The accuracy of the model is affected by the classification chosen and the topic of the message examined. When the vaccine debate is jostled by contested political decisions, tweet content becomes so heterogeneous that the accuracy of the model drops for less differentiated classes. However, our tests showed that it is possible to improve the accuracy by selecting tweets using a new method based on tweet length.
Google Scholar (GS) is a free tool that may be used by researchers to analyze citations; find appropriate literature; or evaluate the quality of an author or a contender for tenure, promotion, a faculty position, funding, or research grants. GS has become a major bibliographic and citation database. For assessing the literature, databases, such as PubMed, PsycINFO, Scopus, and Web of Science, can be used in place of GS because they are more reliable. The aim of this study was to examine the accuracy of citation data collected from GS and provide a comprehensive description of the errors and miscounts identified. For this purpose, 281 documents that cited 2 specific works were retrieved via Publish or Perish software (PoP) and were examined. This work studied the false-positive issue inherent in the analysis of neuroimaging data. The results revealed an unprecedented error rate, with 279 of 281 (99.3%) examined references containing at least one error. Nonacademic documents tended to contain more errors than academic publications (U=5117.0; P<.001). This viewpoint article, based on a case study examining GS data accuracy, shows that GS data not only fail to be accurate but also potentially expose researchers, who would use these data without verification, to substantial biases in their analyses and results. Further work must be conducted to assess the consequences of using GS data extracted by PoP.
La surprenante dialectique entre convocation et disqualification du discours scientifique Un rat est libéré dans une cage d'une longueur d'un mètre au bout de laquelle est disposé un plateau prêt à recueillir de la nourriture. La nourriture est lâchée dans le plateau 10 secondes après la libération du rat dans la cage. Mais, si le rat atteint le plateau avant ce temps de latence (10 secondes), aucune nourriture n'y est déposée. Or, le rat met 2 secondes environ pour atteindre ce plateau. Les rats sujets à l'expérimentation vont donc passer les 8 secondes de latence à produire un comportement différent, mais répétitif, qu'ils reproduiront à chaque fois, de sorte qu'ils vont établir un lien de causalité entre leur comportement et l'accès à cette nourriture. Lorsque les rats obtiennent de la nourriture, ils confirment que leur comportement en est la cause. Pour Watzlawick 1 , ces « types de comportements sont l'équivalent évident des superstitions humaines compulsives, souvent fondées sur la croyance incertaine qu'elles sont requises par quelque "expérimentateur divin". » Cette expérience du rat superstitieux ne laisse pas indifférente en ce qu'elle vise à explorer scientifiquement la formation des croyances. Ce dialogue entre science et croyances n'est pas récent et garde aujourd'hui encore toute sa contemporanéité. Il prend diverses formes : dresser des frontières nettes et étanches, se réclamer de la science pour ériger de nouvelles croyances ou tenter d'explorer au moyen de la science cet imaginaire offert par les croyances. La philosophie a depuis bien longtemps dressé cette frontière entre science et croyances. Ces travaux ont intéressé nombre de chercheurs contemporains, inscrits dans diverses disciplines. Pour ceux-ci, la science produirait des connaissances entendues comme des croyances toujours vraies 2 ou « très probablement vraies » 3 en ce que leur production repose
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.