Harish Karnick scite author profile

We present a feature vector formation technique for documents -Sparse Composite Document Vector (SCDV) -which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embeddings are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document topic-vectors that can express complex, multi-topic documents. Through extensive experiments on multi-class and multi-label classification tasks, we outperform the previous state-of-the-art method, NTSG (Liu et al., 2015a). We also show that SCDV embeddings perform well on heterogeneous tasks like Topic Coherence, context-sensitive Learning and Information Retrieval. Moreover, we achieve significant reduction in training and prediction times compared to other representation methods. SCDV achieves best of both worlds -better performance with lower time and space complexity.

show abstract

Deep Attentive Ranking Networks for Learning to Order Sentences

Kumar

Brahma

Karnick

et al. 2020

AAAI

View full text Add to dashboard Cite

We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise ranking. We apply our framework on two tasks: Sentence Ordering and Order Discrimination. Our framework outperforms various state-of-the-art methods on these tasks on a variety of evaluation metrics. We also show that it achieves better results when using pairwise and listwise ranking losses, rather than the pointwise ranking loss, which suggests that incorporating relative positions of two or more sentences in the loss function contributes to better learning.

show abstract

Genre and Style Based Painting Classification

Agarwal

Karnick

Pant

et al. 2015

View full text Add to dashboard Cite

Kernel-based online machine learning and support vector reduction

Agarwal¹,

Saradhi²,

Karnick³

2008

Neurocomputing

View full text Add to dashboard Cite

Scale independent raga identification using chromagram patterns and swara based features

Dighe

Agrawal

Karnick

et al. 2013

View full text Add to dashboard Cite

Cosine Distance Metric Learning for Speaker Verification Using Large Margin Nearest Neighbor Method

Ahmad

Karnick

Hegde

2014

View full text Add to dashboard Cite

SumPubMed: Summarization Dataset of PubMed Scientific Articles

Gupta¹,

Bharti²,

Nokhiz³

et al. 2021

View full text Add to dashboard Cite

Most earlier work on text summarization is carried out on news article datasets. The summary in these datasets is naturally located at the beginning of the text. Hence, a model can spuriously utilize this correlation for summary generation instead of truly learning to summarize. To address this issue, we constructed a new dataset, SUMPUBMED, using scientific articles from the PubMed archive. We conducted a human analysis of summary coverage, redundancy, readability, coherence, and informativeness on SUMPUBMED. SUMPUBMED is challenging because (a) the summary is distributed throughout the text (not-localized on top), and (b) it contains rare domain-specific scientific terms. We observe that seq2seq models that adequately summarize news articles struggle to summarize SUMPUBMED. Thus, SUMPUBMED opens new avenues for the future improvement of models as well as the development of new evaluation metrics.

show abstract

Extracting semantic structure of web documents using content and visual information

Mehta

Mitra

Karnick

2005

View full text Add to dashboard Cite

This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is utilized using the VIPS algorithm and the content information using a pre-trained Naive Bayes classifier. The output of the algorithm is a semantic structure tree whose leaves represent segments having unique topic. However contents of the leaf segments may possibly be physically distributed in the web page. This structure can be useful in many web applications like information retrieval, information extraction and automatic web page adaptation. This algorithm is expected to outperform other existing page segmentation algorithms since it utilizes both content and visual information.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Harish Karnick

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

Deep Attentive Ranking Networks for Learning to Order Sentences

Genre and Style Based Painting Classification

Kernel-based online machine learning and support vector reduction

Scale independent raga identification using chromagram patterns and swara based features

Cosine Distance Metric Learning for Speaker Verification Using Large Margin Nearest Neighbor Method

SumPubMed: Summarization Dataset of PubMed Scientific Articles

Extracting semantic structure of web documents using content and visual information

Contact Info

Product

Resources

About