Reuse repositories manager manages the reusable software components in different categories and needs to find the category of reusable software components. In this paper, we have used different pure and hybrid approaches to find the domain relevancy of the component to a particular domain. Probabilistic Latent Semantic Analysis (PLSA) approach, LSA, Singular Value Decomposition (SVD) technique, LSA Semi-Discrete Matrix Decomposition (SDD) technique and Naive Bayes Approach purely as well as hybrid, are evaluated to determine the Domain Relevancy of software components. It exploits the fact that Feature Vector codes can be seen as documents containing terms -the identifiers present in the components- and so text modeling methods that capture co-occurrence information in low-dimensional spaces can be used. The FV code representation of clusters or domains is used to find the domain-relevancy of the software components. PLSA has provided better results than LSA retrieval techniques in terms of Precision and Recall but its time complexity is too high. SVD Transformation with Naïve Bayes scheme has outperformed all other approaches and shows better results than the existing approach (LSA) being used by some open source code repositories e.g. Sourceforge. The DR-value determined is close to the manual analysis, used to be performed by the programmers/repository managers. Hence, the tool can also be utilized for the automatic categorization of software components and this kind of automation may improve the productivity and quality of software development
Data Mining is used to extract useful information from a collection of databases or data warehouses. In recent years, Data Mining has become an important field. This paper has surveyed upon data mining and its various techniques that are used to extract useful information such as clustering, and has also surveyed the techniques that are used to detect the outliers. This paper also presents various techniques used by different researchers to detect outliers and present the efficient result to the user.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.