The development and progression of oral cavity squamous cell carcinoma (OSCC) involves complex cellular mechanisms that contribute to the low five-year survival rate of approximately 20% among diagnosed patients. However, the biological processes essential to tumor progression are not completely understood. Therefore, detecting alterations in the salivary proteome may assist in elucidating the cellular mechanisms modulated in OSCC and improve the clinical prognosis of the disease. The proteome of whole saliva and salivary extracellular vesicles (EVs) from patients with OSCC and healthy individuals were analyzed by LC-MS/MS and label-free protein quantification. Proteome data analysis was performed using statistical, machine learning and feature selection methods with additional functional annotation. Biological processes related to immune responses, peptidase inhibitor activity, iron coordination and protease binding were overrepresented in the group of differentially expressed proteins. Proteins related to the inflammatory system, transport of metals and cellular growth and proliferation were identified in the proteome of salivary EVs. The proteomics data were robust and could classify OSCC with 90% accuracy. The saliva proteome analysis revealed that immune processes are related to the presence of OSCC and indicate that proteomics data can contribute to determining OSCC prognosis.
The continued explosion of Twitter data has opened doors for many applications, such as location-based advertisement and entertainment using smartphones. Unfortunately, only about 0.58 percent of tweets are geo-tagged to date. To tackle the location sparseness problem, this paper presents a methodical approach to increasing the number of geotagged tweets by predicting the fine-grained location of those tweets in which their location can be inferred with high confidence. In order to predict the fine-grained location of tweets, we first build probabilistic models for locations using unstructured short messages tightly coupled with semantic locations. Based on the probabilistic models, we propose a 3-step technique (Filtering-Ranking-Validating) for tweet location prediction. In the filtering step, we introduce text analysis techniques to filter out those location-neutral tweets, which may not be related to any location at all. In the ranking step, we utilize ranking techniques to select the best candidate location for a tweet. Finally, in the validating step, we develop a classification-based prediction validation method to verify the location of where the tweet was actually written. We conduct extensive experiments using tweets covering three months and the results show that our approach can increase the number of geo-tagged tweets 4.8 times compared to the original Twitter data and place 34% of predicted tweets within 250m from their actual location.
Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content generation, however, good methods are needed for dividing web pages into fragments. Manual fragmentation of web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in web sites serving dynamic content. We consider the fragments to be interesting if they are shared among multiple documents or they have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a hierarchical and fragment-aware model of the dynamic web pages and a data structure that is compact and effective for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of adopting the fragments detected by our system on disk space utilization and network bandwidth consumption.
WebCQ is a prototype system for large-scale Web information monitoring and delivery. It makes heavy use of the structure present i n h ypertext and the concept of continual queries. In this paper we discuss both mechanisms that WebCQ uses to discover and detect changes to the World Wide Web (the Web) pages e ciently, and the methods to notify users of interesting changes with a personalized customization. The WebCQ system consists of four main components: a c hange detection robot that discovers and detects changes, a proxy cache service that reduces communication tra cs to the original information servers, a personalized presentation tool that highlights changes detected by W ebCQ sentinels, and a change noti cation service that delivers fresh information to the right users at the right time. A salient feature of our change detection robot is its ability to support various types of web page sentinels for detecting, presenting, and delivering interesting changes to web pages. This paper describes the WebCQ system with an emphasis on general issues in designing and engineering a large-scale information change monitoring system on the Web.
except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. springer.com
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.