Abstract. In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.
Abstract. In this paper, we propose a method for composing templates of lung sound classification. First, we obtain a sequence of power spectra by FFT for each given lung sound and compute a small number of component spectra by ICA for each of the overlapping sets of tens of consecutive power spectra. Second, we put component spectra obtained from various lung sounds into a single set and conduct clustering a large number of times. When component spectra belong to the same cluster in all clustering results, these spectra show robust similarity. Therefore, we can use such spectra to compose a template of lung sound classification.
Spectral unmixing is a method by which to estimate the proportion of each component in a pixel using multispectral data. In conventional analysis of remotely sensed images, each pixel is classified into a single object category.However, the actual land surface corresponding to a pixel does not necessarily consist of only one category of objects. Therefore, estimating the proportion of components that exist in a pixel is often useful. The most commonly used method of spectral unmixing assumes that the component spectra are determined from training data. However, available training data do not always correctly represent the spectral characteristics of the categories within the objective area. In such cases, large errors may appear in the results of unmixing.We propose herein the adaptive spectral unmixing method, which estimates suitable component spectra from the actual observed data and thus requires no training data. By adaptively estimating the component spectra from the set of observed data in the objective area, we can correctly estimate the proportion of components even if the spectral characteristics change with the location of objective area. In the proposed method, the spectral reflectance of pixels is expressed by vectors in multi-dimensional space, which can be written as linear combinations of component spectra weighted according to component proportion. We determine the component spectra by finding the minimum volume of simplex containing all of the reflectance vectors, where the vertexes of the simplex correspond to the component spectra.We estimated the degree of errors by numerical simulation and compared the performance of the proposed adaptive method and that of the conventional method.We confirmed that the proposed method of adaptive unmixing provides better results than the conventional method when the spectral characteristics change with the location of the objective area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.