“…To ensure the success of the autonomous data preparation method, a new scenario is constructed to teach the algorithm how adversarial systems pollute the training data and how to discard such data from the training scenarios. While constructing the scenario, the search for training data expands in new and emerging forms of data (NEFD), e.g., open data-Open Data Institute, 3 Elgin, 4 DataViva 5 ; spatiotemporal data-GeoBrick [5], Urban Flow prediction [6], Air quality [7], GIS platform [8]; high-dimensional data-Industrial big data [9], IGA-ELM [10], MDS [11], TMAP [12]; time-stamped data-Qubit, 6 Edge MWN [13], Mobi-IoST [14], Edge DHT analytics [15]; real-time data-CUSUM [16], and big data [17]. The NEFD are needed to teach the AI how to use Spark to aggregate, process and analyse the OSINT big data and to process data in RAM using Resilient Distributed Data set (RDD).…”