With the rapid growth of the Internet in recent years, the ability to analyze and identify its users has become increasingly important. Authorship analysis provides a means to glean information about the author of a document originating from the internet or elsewhere, including but not limited to the author’s gender. There are well-known linguistic differences between the writing of men and women, and these differences can be effectively used to predict the gender of a document’s author. Capitalizing on these linguistic nuances, this study uses a set of stylometric features and a set of word count features to facilitate automatic gender discrimination on emails from the popular Enron email dataset. These features are used in conjunction with the Modified Balanced Winnow Neural Network proposed by Carvalho and Cohen, an improvement on the original Balanced Winnow created by Littlestone. Experiments with the Modified Balanced Winnow show that it is effectively able to discriminate gender using both stylometric and word count features, with the word count features providing superior results
With the rapid growth of web-based social networking technologies in recent years, author identification and analysis have proven increasingly useful. Authorship analysis provides information about a document's author, often including the author's gender. Men and women are known to write in distinctly different ways, and these differences can be successfully used to make a gender prediction. Making use of these distinctions between male and female authors, this study demonstrates the use of a simple stream-based neural network to automatically discriminate gender on manually labeled tweets from the Twitter social network. This neural network, the Modified Balanced Winnow, was employed in two ways; the effectiveness of data stream mining was initially examined with an extensive list of n-gram features. Feature selection techniques were then evaluated by drastically reducing the feature list using WEKA's attribute selection algorithms. This study demonstrates the effectiveness of the stream mining approach, achieving an accuracy of 82.48%, a 20.81% increase above the baseline prediction. Using feature selection methods improved the results by an additional 16.03%, to an accuracy of 98.51%.
There are currently many approaches to identify the community structure of a network, but relatively few specific to detect overlapping community structures. Likewise, there are few networks with ground truth overlapping nodes. For this reason, we introduce a new network, Pilgrim, with known overlapping nodes, and a new genetic algorithm for detecting such nodes. Pilgrim is comprised of a variety of structures including two communities with dense overlap, which is common in real social structures. This study initially explores the potential of the community detection algorithm LabelRank for consistent overlap detection; however, the deterministic nature of this algorithm restricts it to very few candidate solutions. Therefore, we propose a genetic algorithm using a restricted edge-based clustering technique to detect overlapping communities by maximizing an efficient overlapping modularity function. The proposed restriction to the edge-based representation precludes the possibility of disjoint communities, thereby, dramatically reducing the search space and decreasing the number of generations required to produce an optimal solution. A tunable parameter r allows the strictness of the definition of overlap to be adjusted allowing for refinement in the number of identified overlapping nodes. Our method, tested on several real social networks, yields results comparable to the most effective overlapping community detection algorithms to date.
The increasing popularity of social media in recent years has created new opportunities to study the interactions of different groups of people. Never before have so many data about such a large number of individuals been readily available for analysis. Two popular topics in the study of social networks are community detection and sentiment analysis. Community detection seeks to find groups of associated individuals within networks, and sentiment analysis attempts to determine how individuals are feeling. While these are generally treated as separate issues, this study takes an integrative approach and uses community detection output to enable community-level sentiment analysis. Community detection is performed using the Walktrap algorithm on a network of Twitter users associated with Microsoft Corporation's @technet account. This Twitter account is one of several used by Microsoft Corporation primarily for communicating with information technology professionals. Once community detection is finished, sentiment in the tweets produced by each of the communities detected in this network is analyzed based on word sentiment scores from the well-known SentiWordNet lexicon. The combination of sentiment analysis with community detection permits multilevel exploration of sentiment information within the @technet network, and demonstrates the power of combining these two techniques.
Networks are used to represent interactions in a wide variety of fields, like biology, sociology, chemistry, and more. They have a great deal of salient information contained in their structures, which have a variety of applications. One of the important topics of network analysis is finding influential nodes. These nodes are of two kinds -leader nodes and bridge nodes. In this study, we propose an algorithm to find strong leaders in a network based on a revision of neighborhood similarity. This leadership detection is combined with a neighborhood intersection clustering algorithm to produce high quality communities for various networks. We also delve into the structure of a new network, the Houghton College Twitter network, and examine the discovered leaders and their respective followers in more depth than which is frequently attempted for a network of its size. The results of the observations on this and other networks demonstrate that the community partitions found by this algorithm are very similar to those of ground truth communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.