In this paper, we propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days and we present a novel method to compute equidimensional representations of multivariate time series and multivariate spatial time series data. Using this novel method, the proposed model can both take in a large number of heterogeneous features, such as census data, intra-county mobility, inter-county mobility, social distancing data, past growth of infection, among others, and learn complex interactions between these features. Using data collected from various sources, we estimate the range of increase in infected cases seven days into the future for all U.S. counties. In addition, we use the model to identify the most influential features for prediction of the growth of infection. We also analyze pairs of features and estimate the amount of observed second-order interaction between them. Experiments show that the proposed model obtains satisfactory predictive performance and fairly interpretable feature analysis results; hence, the proposed model could complement the standard epidemiological models for nationallevel surveillance of pandemics, such as COVID-19. The results and findings obtained from the deep learning model could potentially inform policymakers and researchers in devising effective mitigation and response strategies. To fast-track further development and experimentation, the code used to implement the proposed model has been made fully open source.
The objective of this study is to propose and test a hybrid machine learning pipeline to uncover the unfolding of disaster events corresponding to different locations from social media posts during disasters. Effective disaster response and recovery require a comprehensive understanding of disaster situations, i.e., unfolding of disaster events and geographic distribution of the disruptions. Existing studies have employed machine learning methods to conduct coarse-grained event detection and analyze the geographical location information from geotagged social media data. However, only a very small fraction of the entire set of social media data includes geotagged information, which may not directly correspond to events described in the content of posts. In addition, the coarse-grained information detected by existing approaches is tokenbased, which does not provide sufficient information for situation awareness. Hence, the detection of location and finer-grained event information could significantly improve the utility, credibility, and interpretability of social media data for situation awareness. To address these limitations, this study proposed a hybrid machine learning pipeline that makes use of all relevant tweets to uncover the evolution of disaster events across different locations. The pipeline integrates Named Entity Recognition for detecting locations mentioned in the posts, location fusion approach to extract coordinates of the locations and remove noise information, fine-tuned BERT model for classifying posts with humanitarian categories, and graph-based clustering to identify credible situational information. The application of the study is demonstrated using the data set collected from Twitter during the 2017 Hurricane Harvey in Houston. The results show the capability of the proposed hybrid pipeline for automated mapping of events across time and space from social media posts with considerable accuracy. The findings also suggest that the potential for forensic analysis of disasters using mapped events and their evolution, and based on the variation of social media attention to different locations in disasters. Hence, this method could provide a useful tool to support emergency managers, public officials, residents, first responders, and other stakeholders in rapid situation awareness across time and space. INDEX TERMS Machine learning pipeline, social media, disasters, automated mapping.
Aggregating evidences have shown that long non-coding RNAs (lncRNAs) generally play key roles in cellular biological processes such as epigenetic regulation, gene expression regulation at transcriptional and post-transcriptional levels, cell differentiation and others. However, most lncRNAs have not been functionally characterized. There is an urgent need to develop computational approaches for function annotation of increasing available lncRNAs. In this article, we propose a global network-based method, KATZLGO, to predict the functions of human lncRNAs at large scale. A global network is constructed by integrating three heterogeneous networks: lncRNA-lncRNA similarity network, lncRNA-protein association network and proteinprotein interaction network. The KATZ measure is then employed to calculate similarities between lncRNAs and proteins in the global network. We annotate lncRNAs with Gene Ontology (GO) terms of their neighboring protein-coding genes based on the KATZ similarity scores. The performance of KATZLGO is evaluated on a manually annotated lncRNA benchmark and a proteincoding gene benchmark with known function annotations. KATZLGO significantly outperforms state-of-the-art computational method both in maximum F-measure and coverage. Furthermore, we apply KATZLGO to predict functions of human lncRNAs and successfully map 12,318 human lncRNA genes to GO terms.
We examined the effect of social distancing on changes in visits to urban hotspot points of interest. In a pandemic situation, urban hotspots could be potential superspreader areas as visits to urban hotspots can increase the risk of contact and transmission of a disease among a population. We mapped census-block-group to point-of-interest (POI) movement networks in 16 cities in the United States. We adopted a modified coarse-grain approach to examine patterns of visits to POIs among hotspots and non-hotspots from January to May 2020. Also, we conducted chi-square tests to identify POIs with significant flux-in changes during the analysis period. The results showed disparate patterns across cities in terms of reduction in hotspot POI visitors. Sixteen cities were divided into two categories using a time series clustering method. In one category, which includes the cities of San Francisco, Seattle and Chicago, we observed a considerable decrease in hotspot POI visitors, while in another category, including the cities of Austin, Houston and San Diego, the visitors to hotspots did not greatly decrease. While all the cities exhibited overall decreased visitors to POIs, one category maintained the proportion of visitors to hotspot POIs. The proportion of visitors to some POIs (e.g. restaurants) remained stable during the social distancing period, while some POIs had an increased proportion of visitors (e.g. grocery stores). We also identified POIs with significant flux-in changes, indicating that related businesses were greatly affected by social distancing. The study was limited to 16 metropolitan cities in the United States. The proposed methodology could be applied to digital trace data in other cities and countries to study the patterns of movements to POIs during the COVID-19 pandemic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.