Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest , a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our Story Forest system is EventX , a semi-supervised scheme to extract events from massive Internet news corpora. EventX relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers.
Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorizationa technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).
This paper introduces a project involving a thermoregulation performance experiment design to evaluate the different responses of research subjects to a range of mastectomy bras and external breast prostheses. A set of newly designed heat-reduction mastectomy bras and prostheses were mix-matched with a set of conventional mastectomy bras and prostheses for the experiment. Four combinations of mastectomy bras and external breast prostheses were used: (a) Com A: conventional mastectomy bra and conventional prosthesis; (b) Com B: conventional mastectomy bra and heat-reduction prosthesis; (c) Com C: heat-reduction mastectomy bra and conventional prosthesis; and (d) Com D: heat-reduction mastectomy bra and heat-reduction prosthesis. Nine healthy male subjects (mean age: 31.9 ± 5.9 y and mean under-bust circumference: 35.3 ± 2.8 in) participated in this study in lieu of women who had undergone surgery for double mastectomy and were too self-conscious to expose their scars for sensor attachment. Eight sets of temperature and humidity sensors were placed between the surface of the skin and the prostheses and bra to measure the changes in both temperature and humidity data in a microclimate environment while the participants performed physical activity. The results showed that Com D demonstrated better thermal and moisture control, resulting in lower body temperature and lower humidity increment throughout the entire experiment. The study proved that the heat-reduction mastectomy bra and external breast prosthesis were effective in releasing the trapped heat and perspiration underneath the bra, and thus would provide a positive impact on clothing comfort and wearing experience for women who had undergone mastectomies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.