To solve the problem of low efficiency in real-time processing and matching of CNAME records in massive DNS log data, a parallel AC automaton enhancement method based on Spark was proposed. The method is based on the Spark distributed cluster computing engine of Hadoop, which ensures the stability of massive DNS log data storage with high fault tolerance and 24-hour real-time processing. At the same time, the Spark distributed cluster uses the multi-thread parallel computing method combined with the improved AC automaton algorithm, which not only reduces the memory occupied by trie construction, but also improves the efficiency of rapid matching of CNAME records of massive DNS logs. Simulation results show that the proposed method can quickly match CNAME records of massive DNS log data. Compared with the original AC algorithm, the efficiency is significantly improved, and the time complexity and storage space are reduced.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.