The latent semantic analysis (LSA) is a mathematical/statistical way of discovering hidden concepts between terms and documents or within a document collection (i.e., a large corpus of text). Each document of the corpus and terms are expressed as a vector with elements corresponding to these concepts to form a term-document matrix. Then, the LSA uses a low-rank approximation to the term-document matrix in order to remove irrelevant information, to extract more important relations, and to reduce the computational time. The irrelevant information is called as “noise” and does not have a noteworthy effect on the meaning of the document collection. This is an essential step in the LSA. The singular value decomposition (SVD) has been the main tool obtaining the low-rank approximation in the LSA. Since the document collection is dynamic (i.e., the term-document matrix is subject to repeated updates), we need to renew the approximation. This can be done via recomputing the SVD or updating the SVD. However, the computational time of recomputing or updating the SVD of the term-document matrix is very high when adding new terms and/or documents to preexisting document collection. Therefore, this issue opened the door of using other matrix decompositions for the LSA as ULV- and URV-based decompositions. This study shows that the truncated ULV decomposition (TULVD) is a good alternative to the SVD in the LSA modeling.
Data mining is the process of obtaining information, which is used to identify and define the relationships between data of different qualities. One of the important problems encountered in this process is the classification process in large data sets. Extensive research has been done to find solutions to this classification problem and different solution methods have been introduced. Some decision tree algorithms are among the structures that can be used effectively in this field. In this article, various decision tree structures and algorithms used for classification process in large data sets are discussed. Along with the definitions of the algorithms, the similarities and existing differences between them were determined, their advantages and disadvantages were investigated.
Watermarking is one of the most common techniques used to protect data’s authenticity, integrity, and security. The obfuscation in the frequency domain used in the watermarking method makes the watermarking stronger than the obfuscation in the spatial domain. It occupies an important place in watermarking works in imperceptibility, capacity, and robustness. Finding the optimal location to hide the watermarking is one of the most challenging tasks in these methods and affects the method’s performance. In this article, sample identification information is processed with the method of watermaking on the hiding environment created by using a chaos-based random number generator on biomedical data to provide solutions to problems such as visual attack, identity theft, and information confusion. In order to obtain biomedical data, a lensless digital in-line holographic microscopy (DIHM) setup was designed, and holographic data of human blood and cancer cell lines, which are widely used in the laboratory environment, were obtained. The standard USAF 1951 target was used to evaluate the resolution of our imaging setup. Various QR codes were generated for medical sample identification, and the captured medical data were processed by watermarking it with chaos-based random number generators. A new method using chaos-based discrete wavelet transform (DWT) and singular value decomposition (SVD) has been developed and applied to high-resolution data to eliminate the problem of encrypted data being directly targeted by third-party attacks. The performance of the proposed new watermarking method has been demonstrated by various robustness and invisibility tests. Experimental results showed that the proposed scheme reached an average PSNR value of 564588 dB and SSIM value of 0.9972 against several geometric and destructive attacks, which means that the proposed method does not affect the image quality and also ensures the security of the watermarking information. The results of the proposed method have shown that it can be used efficiently in various fields.
Highlights❖ SEO processes were performed automatically ❖ The study has been tested with a well-known data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.