The theoretical foundations of Big Data Science are not fully developed, yet. This study proposes a new scalable framework for Big Data representation, high-throughput analytics (variable selection and noise reduction), and model-free inference. Specifically, we explore the core principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) iteratively generates random (sub)samples from a big and complex dataset. This subsampling with replacement is conducted on the feature and case levels and results in samples that are not necessarily consistent or congruent across iterations. The approach relies on an ensemble predictor where established model-based or model-free inference techniques are iteratively applied to preprocessed and harmonized samples. Repeating the subsampling and prediction steps many times, yields derived likelihoods, probabilities, or parameter estimates, which can be used to assess the algorithm reliability and accuracy of findings via bootstrapping methods, or to extract important features via controlled variable selection. CBDA provides a scalable algorithm for addressing some of the challenges associated with handling complex, incongruent, incomplete and multi-source data and analytics challenges. Albeit not fully developed yet, a CBDA mathematical framework will enable the study of the ergodic properties and the asymptotics of the specific statistical inference approaches via CBDA. We implemented the high-throughput CBDA method using pure R as well as via the graphical pipeline environment. To validate the technique, we used several simulated datasets as well as a real neuroimaging-genetics of Alzheimer’s disease case-study. The CBDA approach may be customized to provide generic representation of complex multimodal datasets and to provide stable scientific inference for large, incomplete, and multisource datasets.
The edge devices in an emerging Internet-of-Things (IoT) environment require comprehensive security measures that are within the power budget for ubiquitous computing. In this paper, a transmitter identification scheme consisting of a lightweight Bayesian neural network (BNN)based classifier using raw time-domain data is presented. Evaluation is performed with data obtained in schematic-level simulation of high-efficiency CMOS power amplifier designs using a 65 nm process design kit (PDK). The Bayesian neural networks achieve 89.5% accuracy on the task of classifying six transmitters. Moreover, the BNN classifier is implemented on field-programmable gate array (FPGA) with parallel pseudo-Gaussian random number generators to achieve a throughput of more than 340,000 classifications per second, with average energy consumption for each classification task of 0.548 μJ. This low-power system enables comprehensive security for energy-constrained IoT devices and sensors.INDEX TERMS Hardware security, Bayesian neural networks, radio frequency and wireless circuits, power amplifier, Gaussian random number generator, radio frequency fingerprint, Internet of Things.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.