E-mail address categorization based on semantics of surnames

Veluru, Suresh; Rahulamathavan, Yogachandran; Viswanath, P.; Longley, Paul; Rajarajan, Muttukrishnan

doi:10.1109/cidm.2013.6597240

Cited by 4 publications

(4 citation statements)

References 12 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These are applicable to study inbreeding between marital partners or social groups, but do not explicitly address the semantic similarity between surnames. Hence, an advanced statistical analysis method has been developed for email address categorization based on semantics of surnames [26]. E-mail address categorization based on semantics of surnames has two phase [26].…”

Section: Related Workmentioning

confidence: 99%

“…Hence, an advanced statistical analysis method has been developed for email address categorization based on semantics of surnames [26]. E-mail address categorization based on semantics of surnames has two phase [26]. In the first phase, the semantics of surnames are identified by representing a set of names at each location using a vector space model followed by latent semantic analysis.…”

Section: Related Workmentioning

confidence: 99%

“…In order to address this issue of identifying semantic surnames, Veluru et al. [26] [25] recently applied statistical methods such as vector space model and latent semantic indexing (LSI) in names data set. Further, email address categorization has been performed based on semantics of surnames.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Correlated community estimation models over a set of names

Veluru

Rahulamathavan

Manandhar

et al. 2014

2014 Science and Information Conference

Self Cite

View full text Add to dashboard Cite

Abstract-Generally surnames (family name) or forenames are evolved over generations which can be used to understand population origins, migration, identity, social norms and cultural customs. These forenames or surnames may have hidden structure associated with them called communities. Each community might have strong correlation among several forenames and surnames. In addition, the correlation might be across communities of forenames or surnames. Popular statistical generative model such as Latent Dirichlet Allocation (LDA) has been developed to find topics in a corpus of documents. However, the LDA model can be proposed to identify hidden communities in names data set. This paper proposes several variants of latent Dirichlet allocation models to capture correlation between surnames and forenames within the communities and across the communities over a set of names collected at different locations. Initially, we propose surname correlated LDA model and forename correlated LDA model. These models identify communities in surnames or forenames and extract corresponding correlated forenames or surnames in each community respectively. Later, we propose surname community correlated LDA model and forename community correlated LDA model. These models estimate correlation among each surname community to the communities of forenames and vice versa respectively. We experiment for India and United Kingdom names data sets and conclusions are drawn.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Correlated community estimation models over a set of names

Veluru

Rahulamathavan

Manandhar

et al. 2014

2014 Science and Information Conference

Self Cite

View full text Add to dashboard Cite

show abstract

“…Suffix trees solve a wide range of problems such as exact and inexact matching problems, substring problems, data compression, subsequence problems, longest common substring, string kernels, and circular strings. Recently, suffix trees have been used in surname correction and identification in a corpus of names [13] and e-mail address categorization based on semantics of surnames [14].…”

Section: E-mail Address Profilermentioning

confidence: 99%

The Uncertainty of Identity Toolset

Adnan

Lima

Rossi

et al. 2014

Proceedings of the 7th International Conference on Security of Information and Networks

Self Cite

View full text Add to dashboard Cite

People manage a spectrum of identities in cyber domains. Profiling individuals and assigning them to distinct groups or classes have potential applications in targeted services, online fraud detection, extensive social sorting, and cyber-security. This paper presents the Uncertainty of Identity Toolset, a framework for the identification and profiling of users from their social media accounts and e-mail addresses. More specifically, in this paper we discuss the design and implementation of two tools of the framework. The Twitter Geographic Profiler tool builds a map of the ethno-cultural communities of a person's friends on Twitter social media service.

show abstract

Privacy Preserving Text Analytics

Veluru

Rahulamathavan

Gupta

et al. 2015

Handbook of Research on Securing Cloud-Based Databases With Biometric Applications

View full text Add to dashboard Cite

An e-mail address is a source of communication for major social networking sites. In general, e-mail addresses hold identity in the form a surname as a substring in it. Identities such as names are far from random and can exhibit community distributions over populations. However, these identities reflect cultural, ethnic, and genetic structures generated among populations. Hence, identity establishment in e-mail address mining can be seen as a categorization of e-mail address-based community structure in names data set. It involves community modeling in names, categorization of an e-mail addresses, and identity privacy preservation. This chapter presents a survey of text mining and privacy preserving techniques followed by research challenges and strategies in name analysis. The research challenges are: (1) e-mail address categorization based on community structure of identities, (2) correlation of surnames and forenames within and across communities, and (3) privacy preserving of identities in communities.

show abstract

E-mail address categorization based on semantics of surnames

Cited by 4 publications

References 12 publications

Correlated community estimation models over a set of names

Correlated community estimation models over a set of names

The Uncertainty of Identity Toolset

Privacy Preserving Text Analytics

Contact Info

Product

Resources

About