Sprinkling: Supervised Latent Semantic Indexing

Chakraborti, Sutanu; Lothian, Robert; Wiratunga, Nirmalie; Watt, Stuart

doi:10.1007/11735106_53

Cited by 26 publications

(18 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Pathways and NCI information are preprocessed to generate additional "trusted" documents. The two document sets are then integrated using a customized algorithm based on the sprinkling approach [Chakraborti et al 2006]. Documents are successively analyzed through a latent semantic indexing algorithm.…”

Section: Methodology Overviewmentioning

confidence: 99%

“…In Chakraborti et al [2006], the author introduces sprinkling, a technique whose purpose is to enhance text classification accuracy taking into account document class labels. Labeling the documents helps LSI promoting inferred latent associations between words conceptually belonging to the same class.…”

Section: Latent Semantic Analysis and Sprinklingmentioning

confidence: 99%

“…To evaluate the impact of trusted information and padding on the semantic analysis, we performed a study of the singular values resulting from the SVD decomposition performed by the LSI technique [Chakraborti et al 2006]. …”

Section: Impact Of Padding On Singular Valuesmentioning

confidence: 99%

See 2 more Smart Citations

Integration of Literature with Heterogeneous Information for Genes Correlation Scoring

Abate

Acquaviva

Ficarra

et al. 2013

J. Emerg. Technol. Comput. Syst.

View full text Add to dashboard Cite

Determining the correlation between biomedical terms is a powerful instrument to help scientist research activity, both to understand experimental results and to design new ones. In particular, a great potential comes from the integration of the many heterogeneous information sources currently available on the Web.In this article we focus on the correlation between genes and biological processes. In this context, we present a methodology for integrating information from biomedical literature with other heterogeneous types of structured information. In particular, the information sources integrated in this work are PubMed abstracts, pathway databases, and NCI thesaurus definitions. The integration is performed at the semantic analysis level using a customized approach we developed to modulate the impact of the different sources on the correlation score.We report the results of a study concerning the impact of the information integration on the correlation score and of the user-level parameters we introduced to modulate the impact of pathway data or NCI definitions with respect to biomedical literature information, depending on the context of the search. To evaluate the methodology, we performed correlation measures on six biological processes and nine genes by comparing the results with and without the integration of pathways and NCI definitions.

show abstract

Section: Methodology Overviewmentioning

confidence: 99%

Section: Latent Semantic Analysis and Sprinklingmentioning

confidence: 99%

See 1 more Smart Citation

Integration of Literature with Heterogeneous Information for Genes Correlation Scoring

Abate

Acquaviva

Ficarra

et al. 2013

J. Emerg. Technol. Comput. Syst.

View full text Add to dashboard Cite

show abstract

“…Sprinkling is a process of adding further terms representing class labels to training documents in order to augment class-based relationships in training phase. For instance in [29], latent semantic indexing (LSI) is performed both on standard term-document matrix and term-document matrix augmented with sprinkled terms. The sprinkling process is shown in Figure 1: In Figure 1, to explain the sprinkling process, we use the toy corpus from [28] that has 2 different class labels with 3 documents (Doc-1, Doc-2, and Doc-3 ) and 4 different terms ( t 1 , t 2 , t 3 , and t 4 ).…”

Section: Sprinklingmentioning

confidence: 99%

“…Chakraborti et al [29] [29]. They state that the integration of further knowledge which represents the latent class structure improves the classification performance.…”

Section: Sprinklingmentioning

confidence: 99%

Word sense disambiguation using semantic kernels with class-based term values

Altınel¹,

Ganil²,

Sipal³

et al. 2019

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

In this study, we propose several semantic kernels for word sense disambiguation (WSD). Our approaches adapt the intuition that class-based term values help in resolving ambiguity of polysemous words in WSD. We evaluate our proposed approaches with experiments, utilizing various sizes of training sets of disambiguated corpora (SensEval 1 ).With these experiments we try to answer the following questions: 1.) Do our semantic kernel formulations yield higher classification performance than traditional linear kernel?, 2.) Under which conditions a kernel design performs better than others?, 3.) Does the addition of class labels into standard term-document matrix improve the classification accuracy?, 4.) Is their combination superior to either type?, 5.) Is ensemble of these kernels perform better than the baseline?, 6.) What is the effect of training set size? Our experiments demonstrate that our kernel-based WSD algorithms can outperform baseline in terms of F-score. of words (BOW), which is a well-known feature representation technique that only regards the frequency of the words, a basic similarity calculation such as Cosine or Jaccard among sentences id 1 , id 3 , and id 4 will be zero; since they have no words in common. The same situation is valid for the similarity between sentences id 2 and id 5 . On the other hand, the similarity between sentences id 1 and id 2 will probably be greater than zero since they shared a word; "mouse". Moreover, although they convey different messages, the similarity between sentences id 2 and id 4 will probably be greater than zero since they shared a word "cell".

show abstract