Abstract-Twitter is one of the most popular social networks. Previous research found that users employ Twitter to communicate about software applications via short messages, commonly referred to as tweets, and that these tweets can be useful for requirements engineering and software evolution. However, due to their large number-in the range of thousands per day for popular applications-a manual analysis is unfeasible.In this work we present ALERTme, an approach to automatically classify, group and rank tweets about software applications. We apply machine learning techniques for automatically classifying tweets requesting improvements, topic modeling for grouping semantically related tweets and a weighted function for ranking tweets according to specific attributes, such as content category, sentiment and number of retweets. We ran our approach on 68,108 collected tweets from three software applications and compared its results against software practitioners' judgement. Our results show that ALERTme is an effective approach for filtering, summarizing and ranking tweets about software applications. ALERTme enables the exploitation of Twitter as a feedback channel for information relevant to software evolution, including end-user requirements.
Background: Individuals with major depressive disorder (MDD) vary in their response to antidepressants. However, identifying objective biomarkers, prior to or early in the course of treatment that can predict antidepressant efficacy, remains a challenge.Methods: Individuals with MDD participated in a 12-week antidepressant pharmacotherapy trial. Electroencephalographic (EEG) data was collected before and 1 week post-treatment initiation in 51 patients. Response status at week 12 was established with the Montgomery-Asberg Depression Scale (MADRS), with a ≥50% decrease characterizing responders (N = 27/24 responders/non-responders). We used a machine learning (ML)-approach for predicting response status. We focused on Random Forests, though other ML methods were compared. First, we used a tree-based estimator to select a relatively small number of significant features from: (a) demographic/clinical data (age, sex, individual item/total MADRS scores at baseline, week 1, change scores); (b) scalp-level EEG power; (c) source-localized current density (via exact low-resolution electromagnetic tomography [eLORETA] software). Second, we applied kernel principal component analysis to reduce and map important features. Third, a set of ML models were constructed to classify response outcome based on mapped features. For each dataset, predictive features were extracted, followed by a model of all predictive features, and finally by a model of the most predictive features.Results: Fifty eLORETA features were predictive of response (across bands, both time-points); alpha1/theta eLORETA features showed the highest predictive value. Eighty-eight scalp EEG features were predictive of response (across bands, both time-points), with theta/alpha2 being most predictive. Clinical/demographic data consisted of 31 features, with the most important being week 1 “concentration difficulty” scores. When all features were included into one model, its predictive utility was high (88% accuracy). When the most important features were extracted in the final model, 12 predictive features emerged (78% accuracy), including baseline scalp-EEG frontopolar theta, parietal alpha2 and frontopolar alpha1.Conclusions: These findings suggest that ML models of pre- and early treatment-emergent EEG profiles and clinical features can serve as tools for predicting antidepressant response. While this must be replicated using large independent samples, it lays the groundwork for research on personalized, “biomarker”-based treatment approaches.
Data-driven models are essential tools for the development of surrogate models that can be used for the design, operation, and optimization of industrial processes. One approach of developing surrogate models is through the use of input–output data obtained from a process simulator. To enhance the model robustness, proper sampling techniques are required to cover the entire domain of the process variables uniformly. In the present work, Monte Carlo with pseudo-random samples as well as Latin hypercube samples and quasi-Monte Carlo samples with Hammersley Sequence Sampling (HSS) are generated. The sampled data obtained from the process simulator are fitted to neural networks for generating a surrogate model. An illustrative case study is solved to predict the gas stabilization unit performance. From the developed surrogate models to predict process data, it can be concluded that of the different sampling methods, Latin hypercube sampling and HSS have better performance than the pseudo-random sampling method for designing the surrogate model. This argument is based on the maximum absolute value, standard deviation, and the confidence interval for the relative average error as obtained from different sampling techniques.
Twitter messages (tweets) contain important information for software and requirements evolution, such as feature requests, bug reports and feature shortcoming descriptions. For this reason, Twitter is an important source for crowd-based requirements engineering and software evolution. However, a manual analysis of this information is unfeasible due to the large number of tweets, its unstructured nature and varying quality. Therefore, automatic analysis techniques are needed for, e.g., summarizing, classifying and prioritizing tweets. In this work we present a survey with 84 software engineering practitioners and researchers that studies the tweet attributes that are most telling of tweet priority when performing software evolution tasks. We believe that our results can be used to implement mechanisms for prioritizing user feedback with social components. Thus, it can be helpful for enhancing crowd-based requirements engineering and software evolution. Index Terms-user feedback; crowd-based requirements engineering; crowd-based software evolution.
Background: Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than 90% of all vector-borne diseases in North America. Objective: In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US. Methods: The study was done in three stages: (1) Approximately 1.3 million tweets were collected and and pre-processed to extract the most relevant Lyme disease tweets with geolocations. The tweets were manually labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, yielding a curated labelled dataset of 77500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied. Results: Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of 90%. There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time. Conclusions: We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.