For sales and marketing organizations within large enterprises, identifying and understanding new markets, customers and partners is a key challenge. Intels Sales and Marketing Group (SMG) faces similar challenges while growing in new markets and domains and evolving its existing business. In today's complex technological and commercial landscape, there is need for intelligent automation supporting a fine-grained understanding of businesses in order to help SMG sift through millions of companies across many geographies and languages and identify relevant directions. We present a system developed in our company that mines millions of public business web pages, and extracts a faceted customer representation. We focus on two key customer aspects that are essential for finding relevant opportunities: industry segments (ranging from broad verticals such as healthcare, to more specific fields such as "video analytics") and functional roles (e.g., "manufacturer" or "retail").To address the challenge of labeled data collection, we enrich our data with external information gleaned from Wikipedia, and develop a semi-supervised multi-label, multi-lingual deep learning model that parses customer website texts and classifies them into their respective facets. Our system scans and indexes companies as part of a large-scale knowledge graph that currently holds tens of millions of connected entities with thousands being fetched, enriched and connected to the graph by the hour in real time, and also supports knowledge and insight discovery. In experiments conducted in our company, we are able to significantly boost the performance of sales personnel in the task of discovering new customers and commercial partnership opportunities.Index Terms-AI for Enterprise, NLP, Web Mining I. SYSTEM OVERVIEW Our customer segmentation system is comprised of two major building blocks. The first component is tasked with large-scale data acquisition from the Web and other public sources, and consolidating it with internal corporate data in a Knowledge Graph (KG) we constructed. The second component is a suite of machine learning and natural language processing (NLP) models for segmenting potential customers.Large-scale knowledge graph and web crawling Our solution requires constant streaming of textual data from millions of sites and updating of a multi-milllion node KG with gigabytes of data hourly (see Figure 1). We thus designed a parallelized, asynchronous streaming architecture using microservices to ensure robustness. We rely on Kafka for a message BUS, and use Kafka streams, TensorFlow serving and Neo4J. As existing customers often evolve and expand Fig. 1. Illustration of our Knowledge Graph. Company information from public and internal sources.into new markets and nontraditional domains, our distributed and dynamic crawling process keeps web page information constantly refreshed, using cloud serverless architecture for scaling as the population grows. The data stream is transformed into graph formations using dedicated microservices and m...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.