With the ever increasing number of filed patent applications every year, the need for effective and efficient systems for managing such tremendous amounts of data becomes inevitably important. Patent Retrieval (PR) is considered the pillar of almost all patent analysis tasks. PR is a subfield of Information Retrieval (IR) which is concerned with developing techniques and methods that effectively and efficiently retrieve relevant patent documents in response to a given search request. In this paper we present a comprehensive review on PR methods and approaches. It is clear that, recent successes and maturity in IR applications such as Web search cannot be transferred directly to PR without deliberate domain adaptation and customization. Furthermore, state-of-the-art performance in automatic PR is still around average in terms of recall. These observations motivate the need for interactive search tools which provide cognitive assistance to patent professionals with minimal effort. These tools must also be developed in hand with patent professionals considering their practices and expectations. We additionally touch on related tasks to PR such as patent valuation, litigation, licensing, and highlight potential opportunities and open directions for computational scientists in these domains.
Online job boards are one of the central components of modern recruitment industry. With millions of candidates browsing through job postings everyday, the need for accurate, effective, meaningful, and transparent job recommendations is apparent more than ever. While recommendation systems are successfully advancing in variety of online domains by creating social and commercial value, the job recommendation domain is less explored. Existing systems are mostly focused on content analysis of resumes and job descriptions, relying heavily on the accuracy and coverage of the semantic analysis and modeling of the content in which case, they end up usually suffering from rigidity and the lack of implicit semantic relations that are uncovered from users' behavior and could be captured by Collaborative Filtering (CF) methods. Few works which utilize CF do not address the scalability challenges of real-world systems and the problem of cold-start. In this paper, we propose a scalable item-based recommendation system for online job recommendations. Our approach overcomes the major challenges of sparsity and scalability by leveraging a directed graph of jobs connected by multi-edges representing various behavioral and contextual similarity signals. The short lived nature of the items (jobs) in the system and the rapid rate in which new users and jobs enter the system make the cold-start a serious problem hindering CF methods. We address this problem by harnessing the power of deep learning in addition to user behavior to serve hybrid recommendations. Our technique has been leveraged by CareerBuilder.com which is one of the largest job boards in the world to generate high-quality recommendations for millions of users.
Abstract-Collaborative Filtering (CF) is widely used in large-scale recommendation engines because of its efficiency, accuracy and scalability. However, in practice, the fact that recommendation engines based on CF require interactions between users and items before making recommendations, make it inappropriate for new items which haven't been exposed to the end users to interact with. This is known as the cold-start problem. In this paper we introduce a novel approach which employs deep learning to tackle this problem in any CF based recommendation engine. One of the most important features of the proposed technique is the fact that it can be applied on top of any existing CF based recommendation engine without changing the CF core. We successfully applied this technique to overcome the item cold-start problem in Careerbuilder's CF based recommendation engine. Our experiments show that the proposed technique is very efficient to resolve the coldstart problem while maintaining high accuracy of the CF recommendations.
Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia, and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: 1) analogical reasoning, where we achieve a stateof-the-art performance of 91% on semantic analogies, 2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.In this paper, we use the terms "concept" and "entity" interchangeably.
Mined Semantic Analysis (MSA) is a novel concept space model which employs unsupervised learning to generate semantic representations of text. MSA represents textual structures (terms, phrases, documents) as a Bag of Concepts (BoC) where concepts are derived from concept rich encyclopedic corpora. Traditional concept space models exploit only target corpus content to construct the concept space. MSA, alternatively, uncovers implicit relations between concepts by mining for their associations (e.g., mining Wikipedia's "See also" link graph). We evaluate MSA's performance on benchmark datasets for measuring semantic relatedness of words and sentences. Empirical results show competitive performance of MSA compared to prior state-of-the-art methods. Additionally, we introduce the first analytical study to examine statistical significance of results reported by different semantic relatedness methods. Our study shows that, the nuances of results across top performing methods could be statistically insignificant. The study positions MSA as one of state-ofthe-art methods for measuring semantic relatedness, besides the inherent interpretability and simplicity of the generated semantic representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.