Hierarchical Density-Based Clustering Based on GPU Accelerated Data Indexing Strategy

Melo, Danilo; Toledo, Svyo; Mouro, Fernando; Sachetto, Rafael; Andrade, Guilherme; Ferreira, Renato; Parthasarathy, S.; Rocha, Leonardo

doi:10.1016/j.procs.2016.05.389

Cited by 10 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A parallel computing environment has become the first choice for solving big data processing problems. Some researchers have proposed parallel algorithms based on multithreading [ 18 ]. Although the pressure of storage and calculation has been relieved to a great extent, the limitation of memory resources has become the algorithm of the bottleneck of expansion; the FP-Growth algorithm is based on the Apriori principle.…”

Section: Related Theoriesmentioning

confidence: 99%

Research on English Achievement Analysis Based on Improved CARMA Algorithm

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

This paper uses data mining technology to analyze students’ English scores. In view of the influence of many factors on students’ English performance, the analysis is realized by using the association rule algorithm. The thesis analyzes and applies students’ English scores based on association rules and mainly does the following work: (1) at present, the problem of the CARMA algorithm is low operating efficiency. The combination of the genetic algorithm’s crossover, mutation, and the CARMA algorithm realizes the fast search of the algorithm. The simulation results show that the operation performance of the algorithm is greatly improved after the crossover and mutation operations in the genetic algorithm are applied to the CARMA algorithm. The simulation results show that the mining accuracy of the improved algorithm is 97.985%, and the mining accuracy before the improvement is 92.221%, indicating that the improved algorithm can improve the accuracy of mining. (2) By comparing the mining time of the improved CARMA algorithm, the traditional CARMA algorithm, the FP-Growth algorithm, and the Apriori algorithm, the results show that when the number is 6,500, the mining efficiency of the improved CARMA algorithm is twice that of the other three algorithms. As the amount of data increases, the effect of improving mining efficiency gradually increases. (3) By using the improved CARMA algorithm to analyze students’ English performance, it is found that the quality of student performance is strongly related to the quality of daily homework, and if it is related to the teacher’s gender, professional title, etc., it is recommended that schools should pay more attention to homework during the teaching process.

show abstract

Section: Related Theoriesmentioning

confidence: 99%

Research on English Achievement Analysis Based on Improved CARMA Algorithm

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…These low-volume clusters often contain valuable information, which might not even be known to medical experts: their low volume makes it difficult to detect them via manual inspection. To perform clustering in the embedding space, we use the hierarchical, density-based clustering algorithm HDBSCAN (Campello et al, 2013;Melo et al, 2016;McInnes et al, 2017). As customary in unsupervised learning tasks, one needs to provide some information on the desired granularity, i.e.…”

Section: Clustering Similar Medical Inquiries Via Hierarchical Clusteringmentioning

confidence: 99%

“…Here, we propose an approach -schematically depicted in Figure 1 -to discover topics from short, unstructured, real-world medical inquiries. Our methodology consists of the following steps: medical inquiries are preprocessed (via lemmatization, stopword removal) and converted to vectors via a biomedical word embedding (scispacy (Neumann et al, 2019)), a dimensionality reduction is then applied to lower the dimensionality of the embedded vectors (via UMAP (McInnes et al, 2018a;McInnes et al, 2018b)), clustering is performed in this lower dimensional space to group together similar inquiries (via HDBSCAN (Campello et al, 2013;Melo et al, 2016;McInnes et al, 2017)). These clusters of similar inquiries are then merged based on semantic similarity: we define these (merged) clusters as topics.…”

Section: Introductionmentioning

confidence: 99%

Discovering Key Topics From Short, Real-World Medical Inquiries via Natural Language Processing

Ziletti

Berns²,

Treichel

et al. 2021

Front. Comput. Sci.

View full text Add to dashboard Cite

Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely, recurrent, and comprehensive analyses. Here, we combine biomedical word embeddings, non-linear dimensionality reduction, and hierarchical clustering to automatically discover key topics in real-world medical inquiries from customers. This approach does not require ontologies nor annotations. The discovered topics are meaningful and medically relevant, as judged by medical information specialists, thus demonstrating that unsolicited medical inquiries are a source of valuable customer insights. Our work paves the way for the machine-learning-driven analysis of medical inquiries in the pharmaceutical industry, which ultimately aims at improving patient care.

show abstract

“…One of the outputs that this text mining process produces is a semantic tree which can be explored interactively on the PoliRural platform (see PoliRural Innovation Hub section). We will use ANNOY and HDBSCAN for clustering (Melo et al, 2016) and novel Word Mover's Distance for sentence and paragraph similarity analysis (Ye et al, 2016).…”

Section: Text Mining Enabled Policy Evaluationmentioning

confidence: 99%

Towards Future Oriented Collaborative Policy Development for Rural Areas and People

Ulman

Šimek

Masner

et al. 2020

AOL

View full text Add to dashboard Cite

Rural areas in Europe are at risk due to depopulation, failing generation renewal, and a multitude of influences ranging from market-based, regulatory, to societal and climate changes. As a result, current rural policy is no longer keeping pace with these changes. We propose an advanced rural policy development framework in order to deliver more accurate foresight for rural regions, contributing to new and enhanced policy interventions. The proposed framework combines new quantitative and qualitative epistemological approaches, previously unused unstructured data with traditional research information, grassroot perspective with expert knowledge, current situation analysis with forward looking activities. We argue that by using the proposed methods, policy teams will be able to enhance the effectiveness of their policy making processes, while rural stakeholders will be given the opportunity to become valuable policy influencers and solution co-creators. The ability to quickly experiment and understand the impact of a variety of policy solutions will result in saved time and costs. The framework is part of an ongoing experimental verification and testing in twelve pilot regions across Europe and Israel.

show abstract

Hierarchical Density-Based Clustering Based on GPU Accelerated Data Indexing Strategy

Cited by 10 publications

References 12 publications

Research on English Achievement Analysis Based on Improved CARMA Algorithm

Research on English Achievement Analysis Based on Improved CARMA Algorithm

Discovering Key Topics From Short, Real-World Medical Inquiries via Natural Language Processing

Towards Future Oriented Collaborative Policy Development for Rural Areas and People

Contact Info

Product

Resources

About