“…As Blei (2012) notes, topics and topical decompositions are not in a sense 'definitive.' Fitting a model to any collection will yield patterns regardless of whether they exist in a true sense the corpus.…”
This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques.
“…As Blei (2012) notes, topics and topical decompositions are not in a sense 'definitive.' Fitting a model to any collection will yield patterns regardless of whether they exist in a true sense the corpus.…”
This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques.
“…If we can combine the results of this paper with expert opinions, we can expect a more accurate and valid result for sustainable technology analysis between competitors. Thus, in our future work, we will apply opinion mining [56], sentiment analysis [57], and topic model [58,59] to our methodology for sustainable technology analysis. This paper dealt with a more efficient way of finding sustainability in a specific technology field by introducing a new time concept that was not covered in the existing quantitative analysis methods for selecting sustainable technologies.…”
Abstract:The technology of three-dimensional (3D) printing was commercialized in the late 1980s. Since then, the development of this technology has been dramatically increasing. Moreover, 3D printing technology has been used in many different fields, such as electronics and medical appliances, because 3D printing is a technological convergence based on precision instruments, chemical materials, and electrical equipment. The technological impact of 3D printing is so powerful that we need to analyze 3D printing technology to understand the 3D printing industry. In addition, we want more analytical results for understanding the sustainability of 3D printing technology. Thus, we compare the technologies between 3D printing competitors to find their technological innovations and evolution from a technological sustainability. To analyze the 3D printing technology, we propose a new methodology of statistical technology analysis combing social network analysis with time series clustering. In our case study, we make a comparison between "3D Systems" and "Stratasys", two major 3D printing companies, because they have been leading the sustainable technologies of 3D printing in the market. We illustrate how the proposed methodology can be applied to practical problems from the case study. This paper contributes to the sustainable technology management, and our research can expand to other competitors with diverse technological fields as well as 3D printing.
“…The LSA space was built using the stochastic SVD decomposition from Apache Mahout [26] which was applied on the term-document matrix weighted with log-entropy, across 300 dimensions. LDA made use of parallel Gibbs sampling implemented in Mallet [27] and the model was created with 100 topics, as suggested by Blei [28]. A manual inspection of top 100 words from each LDA topic suggested that the space was adequately constructed due to the fact that the most representative words from each topic were semantically related one to another.…”
Section: The Nlp Processing Pipeline For Dutch Languagementioning
Abstract. Automated Essay Scoring has gained a wider applicability and usage with the integration of advanced Natural Language Processing techniques which enabled in-depth analyses of discourse in order capture the specificities of written texts. In this paper, we introduce a novel Automatic Essay Scoring method for Dutch language, built within the Readerbench framework, which encompasses a wide range of textual complexity indices, as well as an automated segmentation approach. Our method was evaluated on a corpus of 173 technical reports automatically split into sections and subsections, thus forming a hierarchical structure on which textual complexity indices were subsequently applied. The stepwise regression model explained 30.5% of the variance in students' scores, while a Discriminant Function Analysis predicted with substantial accuracy (75.1%) whether they are high or low performance students.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.