Essential information is often conveyed in illustrations in biomedical publications. A clinician's decision to access the full text when searching for evidence in support of clinical decision is frequently based solely on a short bibliographic reference. We seek to automatically augment these references with images from the article that may assist in finding evidence.The feasibility of automatically classifying images by usefulness (utility) in finding evidence was explored using supervised machine learning. We selected 2004 --2005 issues of the British Journal of Oral and Maxillofacial Surgery, manually annotating 743 images by utility and modality (radiological, photo, etc.) Image data, figure captions, and paragraphs surrounding figure discussions in text were used in classification.Automatic image classification achieved 84.3% accuracy using image captions for modality and 76.6% accuracy combining captions and image data for utility.Our results indicate that automatic augmentation of bibliographic references with relevant images is feasible.
Multi-sensor data fusion has been an area of intense recent research and development activity. This concept has been applied to numerous fields and new applications are being explored constantly. Multi-sensor based Collaborative Click Fraud Detection and Prevention (CCFDP) system can be viewed as a problem of evidence fusion. In this paper we detail the multi level data fusion mechanism used in CCFDP for real time click fraud detection and prevention. Prevention mechanisms are based on blocking suspicious traffic by IP, referrer, city, country, ISP, etc. Our system maintains an online database of these suspicious parameters. We have tested the system with real-world data from an actual ad campaign where the results show that use of multilevel data fusion improves the quality of click fraud analysis.
Emara, Wael, "A submodular optimization framework for never-ending learning : semi-supervised, online, and active learning." (2012 Elmaghraby for all his effort to provide a healthy research environment in the computer science department.I also would like to express my deep gratitude for my parents and sister for all the support they provided me through the years. The revolution in information technology and the explosion in the use of computing devices in people's everyday activities has forever changed the perspective of the data mining and machine learning fields. The enormous amounts of easily accessible, information rich data is pushing the data analysis community in general towards a shift of paradigm. In the new paradigm, data comes in the form a stream of billions of records received everyday.The dynamic nature of the data and its sheer size makes it impossible to use the traditional notion of offline learning where the whole data is accessible at any time point. Moreover, no amount of human resources is enough to get expert feedback on the data.In this work we have developed a unified optimization based learning framework that approaches many of the challenges mentioned earlier. Specifically, we developed a Never-Ending Learning framework which combines incremental/online, semi-supervised, and active learning under a unified optimization framework. The established framework is based on the class of submodular optimization methods.At the core of this work we provide a novel formulation of the Semi-Supervised Support Vector Machines (S 3 VM) in terms of submodular set functions. The new formulation overcomes the non-convexity issues of the S 3 VM and provides a state of the art solution that is orders of magnitude faster than the cutting edge algorithms in the literature.Next, we provide a stream summarization technique via exemplar selection. This technique makes it possible to keep a fixed size exemplar representation of a data stream IV that can be used by any label propagation based semi-supervised learning technique. The compact data steam representation allows a wide range of algorithms to be extended to incremental/online learning scenario. Under the same optimization framework, we provide an active learning algorithm that constitute the feedback between the learning machine and an oracle.Finally, the developed Never-Ending Learning framework is essentially transductive in nature. Therefore, our last contribution is an inductive incremental learning technique for incremental training of SVM using the properties of local kernels. We demonstrated through this work the importance and wide applicability of the proposed methodologies.v
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.