Melanoma is one of ten most common cancers in the US. Early detection is crucial for survival, but often the cancer is diagnosed in the fatal stage. Deep learning has the potential to improve cancer detection rates, but its applicability to melanoma detection is compromised by the limitations of the available skin lesion data bases, which are small, heavily imbalanced, and contain images with occlusions. We build deep-learning-based tools for data purification and augmentation to counter-act these limitations. The developed tools can be utilized in a deep learning system for lesion classification and we show how to build such system. The system heavily relies on the processing unit for removing image occlusions and the data generation unit, based on generative adversarial networks, for populating scarce lesion classes, or equivalently creating virtual patients with pre-defined types of lesions. We empirically verify our approach and show that incorporating these two units into melanoma detection system results in the superior performance over common baselines.
This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs). Existing techniques in statistical learning theory require a computation of capacity measures, such as VC dimension, to provably bound this error. It is however unclear how to extend these measures to DNNs and therefore the existing analyses are applicable to simple neural networks, which are not used in practice, e.g., linear or shallow (at most two-layer) ones or otherwise multi-layer perceptrons. Moreover many theoretical error bounds are not empirically verifiable. In this paper we derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures. The enabling technique in our approach hinges on two major assumptions: i) the network achieves zero training error, ii) the probability of making an error on a test point is proportional to the distance between this point and its nearest training point in the feature space and at certain maximal distance (that we call radius) it saturates. Based on these assumptions we estimate the generalization error of DNNs. The obtained estimate scales as O 1 δN 1/d , where N is the size of the training data, and is parameterized by two quantities, the effective dimensionality of the data as perceived by the network (d) and the aforementioned radius (δ), both of which we find empirically. We show that our estimates match with the experimentallyobtained behavior of the error on multiple learning tasks using benchmark data-sets and realistic models. Estimating training data requirements is essential for deployment of safety critical applications such as autonomous driving, medical diagnostics etc. Furthermore, collecting and annotating training data requires a huge amount of financial, computational and human resources. Our empirical estimates will help to efficiently allocate resources.
Securing enterprise networks presents challenges in terms of both their size and distributed structure. Data required to detect and characterize malicious activities may be diffused and may be located across network and endpoint devices.Further, cyberrelevant data routinely exceeds total available storage, bandwidth, and analysis capability, often by several orders of magnitude. Real-time detection of threats within or across very large enterprise networks is not simply an issue of scale, but also a challenge due to the variable nature of malicious activities and their presentations. The system seeks to develop a hierarchy of cyber reasoning layers to detect malicious behavior, characterize novel attack vectors and present an analyst with a contextualized human-readable output from a series of machine learning models. We developed machine learning algorithms for scalable throughput and improved recall for our Multi-Resolution Joint Optimization for Enterprise Security and Forensics (ESAFE) solution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.