Click-through rate (CTR) prediction is critical for industrial applications such as recommender system and online advertising. Practically, it plays an important role for CTR modeling in these applications by mining user interest from rich historical behavior data. Driven by the development of deep learning, deep CTR models with ingeniously designed architecture for user interest modeling have been proposed, bringing remarkable improvement of model performance over offline metric. However, great efforts are needed to deploy these complex models to online serving system for realtime inference, facing massive traffic request. Things turn to be more difficult when it comes to long sequential user behavior data, as the system latency and storage cost increase approximately linearly with the length of user behavior sequence.In this paper, we face directly the challenge of long sequential user behavior modeling and introduce our hands-on practice with the co-design of machine learning algorithm and online serving system for CTR prediction task. (i) From serving system view, we decouple the most resource-consuming part of user interest modeling from the entire model by designing a separate module named UIC (User Interest Center). UIC maintains the latest interest state for each user, whose update depends on realtime user behavior trigger event, rather than on traffic request. Hence UIC is latency free for realtime CTR prediction. (ii) From machine learning algorithm view, we propose a novel memory-based architecture named MIMN (Multi-channel user Interest Memory Network) to capture user interests from long sequential behavior data, achieving superior performance over state-of-the-art models. MIMN is implemented in an incremental manner with UIC module.Theoretically, the co-design solution of UIC and MIMN enables us to handle the user interest modeling with unlimited length of sequential behavior data. Comparison between model performance and system efficiency proves the effectiveness of proposed solution. To our knowledge, this is one of the first industrial solutions that are capable of handling long sequential user behavior data with length scaling up to thousands. It now has been deployed in the display advertising system in Alibaba.
Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN[8] proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches. In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit (GSU) acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence (SBS) which is relevant to candidate item; (ii) Exact Search Unit (ESU) models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1% CTR and 4.4% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models sequential user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.
Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN[8] proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches.In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit (GSU) acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence (SBS) which is relevant to candidate item; (ii) Exact Search Unit (ESU) models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1% CTR and 4.4% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models sequential user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.
Background Pulmonary tuberculosis (PTB) is an infectious disease of major public health problem, China is one of the PTB high burden counties in the word. Hubei is one of the provinces having the highest notification rate of tuberculosis in China. This study analyzed the temporal and spatial distribution characteristics of PTB in Hubei province for targeted intervention on TB epidemics. Methods The data on PTB cases were extracted from the National Tuberculosis Information Management System correspond to population in 103 counties of Hubei Province from 2011 to 2021. The effect of PTB control was measured by variation trend of bacteriologically confirmed PTB notification rate and total PTB notification rate. Time series, spatial autonomic correlation and spatial-temporal scanning methods were used to identify the temporal trends and spatial patterns at county level of Hubei. Results A total of 436,955 cases were included in this study. The total PTB notification rate decreased significantly from 81.66 per 100,000 population in 2011 to 52.25 per 100,000 population in 2021. The peak of PTB notification occurred in late spring and early summer annually. This disease was spatially clustering with Global Moran’s I values ranged from 0.34 to 0.63 (P< 0.01). Local spatial autocorrelation analysis indicated that the hot spots are mainly distributed in the southwest and southeast of Hubei Province. Using the SaTScan 10.0.2 software, results from the staged spatial-temporal analysis identified sixteen clusters. Conclusions This study identified seasonal patterns and spatial-temporal clusters of PTB cases in Hubei province. High-risk areas in southwestern Hubei still exist, and need to focus on and take targeted control and prevention measures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.