This overview gravitates on research achievements that have recently emerged from the confluence between Big Data technologies and bio-inspired computation. A manifold of reasons can be identified for the profitable synergy between these two paradigms, all rooted on the adaptability, intelligence and robustness that biologically inspired principles can provide to technologies aimed to manage, retrieve, fuse and process Big Data efficiently. We delve into this research field by first analyzing in depth the existing literature, with a focus on advances reported in the last few years. This prior literature analysis is complemented by an identification of the new trends and open challenges in Big Data that remain unsolved to date, and that can be effectively addressed by bio-inspired algorithms. As a second contribution, this work elaborates on how bio-inspired algorithms need to be adapted for their use in a Big Data context, in which data fusion becomes crucial as a previous step to allow processing and mining several and potentially heterogeneous data sources. This analysis allows exploring and comparing the scope and efficiency of existing approaches across different problems and domains, with the purpose of identifying new potential applications and research niches. Finally, this survey highlights open issues that remain unsolved to date in this research avenue, alongside a prescription of recommendations for future research.
The proliferation of social networks and their usage by a wide spectrum of user profiles has been specially notable in the last decade. A social network is frequently conceived as a strongly interlinked community of users, each featuring a compact neighborhood tightly and actively connected through different communication flows. This realm unleashes a rich substrate for a myriad of malicious activities aimed at unauthorizedly profiting from the user itself or from his/her social circle. This manuscript elaborates on a practical approach for the detection of identity theft in social networks, by which the credentials of a certain user are stolen and used without permission by the attacker for its own benefit. The proposed scheme detects identity thefts by exclusively analyzing connection time traces of the account being tested in a nonintrusive manner. The manuscript formulates the detection of this attack as a binary classification problem, which is tackled by means of a support vector classifier applied over features inferred from the original connection time traces of the user. Simulation results are discussed in depth toward elucidating the potentiality of the proposed system as the first step of a more involved impersonation detection framework, also relying on connectivity patterns and elements from language processing. Goals pursued by attacks in social networks may reside not only in the economic profitability of the attacker, but also in other interests achievable by unauthorizedly accessing the information of the victim (e.g. bullying or intimidation, particularly frequent within the teenage community). It is often the case that sensitive information items are carelessly posted in social networks, whose revelation may trigger dramatic consequences, security breaches, and eventually fatal circumstances for the victim. Although the need for detection schemes specially tailored to attacks in social networks has been noted by the research community, contributions in this matter are relatively scarce. Furthermore, they hinge mostly on ad-hoc designed detectors for a certain attack class approach based mainly on analyzing private features from the user account (e.g. content of the messages or contact list).From a more general point of view, motivations and goals for cybercrimes may vary within a wide spectrum of possibilities that unchain an equally diverse portfolio of detection methods. In particular, phishing refers to those procedures used for broadcasting messages from apparently reputable sources succinctly devoted to capturing sensitive information such as account credentials or credit card details [4][5][6][7]. Research on this class of attacks has gravitated on the application of textual phishing indicators [8,9] and information retrieval algorithms such as hidden Markov models, latent Dirichlet allocation, or naïve bag-of-words procedures [10]. Other features used for the detection of phishing attacks have been found in Internet search engines, which help find inconsistencies between the fake and the...
The rapid growth of new computing paradigms such as Cloud Computing and Big Data has unleashed great opportunities for companies to shift their business model towards a fully digital strategy. A major obstacle in this matter is the requirement of highly specialized ICT infrastructures that are expensive and difficult to manage. It is at this point that the IaaS (infrastructure as a service) model offers an efficient and cost‐affordable solution to supply companies with their required computing resources. In the Big Data context, it is often a hard task to design an optimal IaaS solution that meets user requirements. In this context, we propose a methodology to optimize the definition of IaaS cloud models for hosting Big Data platforms, following a threefold criterion: cost, reliability, and computing capacity. Specifically, the proposed methodology hinges on evolutionary heuristics in order to find IaaS configurations in the cloud that optimally balance such objectives. We also define measures to quantify the aforementioned metrics over a Big Data platform hosted within an IaaS cloud model. The proposed method is validated by using real information from three IaaS providers and three Big Data platforms. The obtained results provide an insightful input for system managers when initially designing cloud infrastructures for Big Data applications.
In the smart city context, Big Data analytics plays an important role in processing the data collected through IoT devices. The analysis of the information gathered by sensors favors the generation of specific services and systems that not only improve the quality of life of the citizens, but also optimize the city resources. However, the difficulties of implementing this entire process in real scenarios are manifold, including the huge amount and heterogeneity of the devices, their geographical distribution, and the complexity of the necessary IT infrastructures. For this reason, the main contribution of this paper is the PADL description language, which has been specifically tailored to assist in the definition and operationalization phases of the machine learning life cycle. It provides annotations that serve as an abstraction layer from the underlying infrastructure and technologies, hence facilitating the work of data scientists and engineers. Due to its proficiency in the operationalization of distributed pipelines over edge, fog, and cloud layers, it is particularly useful in the complex and heterogeneous environments of smart cities. For this purpose, PADL contains functionalities for the specification of monitoring, notifications, and actuation capabilities. In addition, we provide tools that facilitate its adoption in production environments. Finally, we showcase the usefulness of the language by showing the definition of PADL-compliant analytical pipelines over two uses cases in a smart city context (flood control and waste management), demonstrating that its adoption is simple and beneficial for the definition of information and process flows in such environments.
Development and operations (DevOps), artificial intelligence (AI), big data and edge–fog–cloud are disruptive technologies that may produce a radical transformation of the industry. Nevertheless, there are still major challenges to efficiently applying them in order to optimise productivity. Some of them are addressed in this article, concretely, with respect to the adequate management of information technology (IT) infrastructures for automated analysis processes in critical fields such as the mining industry. In this area, this paper presents a tool called Pangea aimed at automatically generating suitable execution environments for deploying analytic pipelines. These pipelines are decomposed into various steps to execute each one in the most suitable environment (edge, fog, cloud or on-premise) minimising latency and optimising the use of both hardware and software resources. Pangea is focused in three distinct objectives: (1) generating the required infrastructure if it does not previously exist; (2) provisioning it with the necessary requirements to run the pipelines (i.e., configuring each host operative system and software, install dependencies and download the code to execute); and (3) deploying the pipelines. In order to facilitate the use of the architecture, a representational state transfer application programming interface (REST API) is defined to interact with it. Therefore, in turn, a web client is proposed. Finally, it is worth noting that in addition to the production mode, a local development environment can be generated for testing and benchmarking purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.