With the rapid development of Internet, people show growing dependence on network communications. Each user has a lot of network virtual identities because of the explosive increase of network applications. Compared with previous work, our study proposes an algorithm to find virtual identities which belong to the same natural person. On the other hand, in view of massive data from users' access records, our study is based on a quick, general engine with ease of use for large-scale data processing platform and clustering system called Spark.