We investigated whether and how political misinformation is engineered using a dataset of four months worth of tweets related to the 2016 presidential election in the United States. The data contained tweets that achieved a significant level of exposure and was manually labelled into misinformation and regular information. We found that misinformation was produced by accounts that exhibit different characteristics and behaviour from regular accounts. Moreover, the content of misinformation is more novel, polarised and appears to change through coordination. Our findings suggest that engineering of political misinformation seems to exploit human traits such as reciprocity and confirmation bias. We argue that investigating how misinformation is created is essential to understand human biases, diffusion and ultimately better produce public policy. INDEX TERMS Misinformation, data science, US elections, politics. AXEL OEHMICHEN received the M.Eng. degree in computer science and applied mathematics from ENSEEIHT, in 2012, and the Ph.D. degree from the Imperial College London, in 2018. After working for a year in the City of London for Societe Generale CIB, he joined the Data Science Institute, Imperial College London, as a Research Assistant, where he is currently a Postgraduate Research Associate. His current research interests include the creation of flexible and scalable platforms for collecting, storing, and analyzing large scale data in a privacy preserving fashion.
BackgroundHigh-throughput molecular profiling data has been used to improve clinical decision making by stratifying subjects based on their molecular profiles. Unsupervised clustering algorithms can be used for stratification purposes. However, the current speed of the clustering algorithms cannot meet the requirement of large-scale molecular data due to poor performance of the correlation matrix calculation. With high-throughput sequencing technologies promising to produce even larger datasets per subject, we expect the performance of the state-of-the-art statistical algorithms to be further impacted unless efforts towards optimisation are carried out. MapReduce is a widely used high performance parallel framework that can solve the problem.ResultsIn this paper, we evaluate the current parallel modes for correlation calculation methods and introduce an efficient data distribution and parallel calculation algorithm based on MapReduce to optimise the correlation calculation. We studied the performance of our algorithm using two gene expression benchmarks. In the micro-benchmark, our implementation using MapReduce, based on the R package RHIPE, demonstrates a 3.26-5.83 fold increase compared to the default Snowfall and 1.56-1.64 fold increase compared to the basic RHIPE in the Euclidean, Pearson and Spearman correlations. Though vanilla R and the optimised Snowfall outperforms our optimised RHIPE in the micro-benchmark, they do not scale well with the macro-benchmark. In the macro-benchmark the optimised RHIPE performs 2.03-16.56 times faster than vanilla R. Benefiting from the 3.30-5.13 times faster data preparation, the optimised RHIPE performs 1.22-1.71 times faster than the optimised Snowfall. Both the optimised RHIPE and the optimised Snowfall successfully performs the Kendall correlation with TCGA dataset within 7 hours. Both of them conduct more than 30 times faster than the estimated vanilla R.ConclusionsThe performance evaluation found that the new MapReduce algorithm and its implementation in RHIPE outperforms vanilla R and the conventional parallel algorithms implemented in R Snowfall. We propose that MapReduce framework holds great promise for large molecular data analysis, in particular for high-dimensional genomic data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new algorithm as a basis for optimising high-throughput molecular data correlation calculation for Big Data.
Recently we have observed emerging uses of deep learning techniques in multimedia systems. Developing a practical deep learning system is arduous and complex. It involves labor-intensive tasks for constructing sophisticated neural networks, coordinating multiple network models, and managing a large amount of trainingrelated data. To facilitate such a development process, we propose TensorLayer which is a Python-based versatile deep learning library. TensorLayer provides high-level modules that abstract sophisticated operations towards neuron layers, network models, training data and dependent training jobs. In spite of offering simplicity, it has transparent module interfaces that allows developers to flexibly embed low-level controls within a backend engine, with the aim of supporting fine-grain tuning towards training. Real-world cluster experiment results show that TensorLayer is able to achieve competitive performance and scalability in critical deep learning tasks. TensorLayer was released in September 2016 on GitHub. Since after, it soon become one of the most popular open-sourced deep learning library used by researchers and practitioners.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.