Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set..
Integrated Systems Health Management includes fault detection, fault diagnosis (or fault isolation), and fault prognosis. We define prognosis to be detecting the precursors of a failure, and predicting how much time remains before a likely failure. Algorithms that use the data-driven approach to prognosis learn models directly from the data, rather than using a hand-built model based on human expertise. This paper surveys past work in the datadriven approach to prognosis. It also includes related work in data-driven fault detection and diagnosis, and in model-based diagnosis and prognosis, particularly as applied to space systems.
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set..
Modern space propulsion and exploration system designs are becoming increasingly sophisticated and complex. Determining the health state of these systems using traditional methods is becoming more difficult as the number of sensors and component interactions grows. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. The Inductive Monitoring System (IMS) is a data-driven system health monitoring software tool that has been successfully applied to several aerospace applications. IMS uses a data mining technique called clustering to analyze archived system data and characterize normal interactions between parameters. This characterization, or model, of nominal operation is stored in a knowledge base that can be used for real-time system monitoring or for analysis of archived events. Ongoing and developing IMS space operations applications include International Space Station flight control, satellite vehicle system health management, launch vehicle ground operations, and fleet supportability. As a common thread of discussion this paper will employ the evolution of the IMS data-driven technique as related to several Integrated Systems Health Management (ISHM) elements. Thematically, the projects listed will be used as case studies. The maturation of IMS via projects where it has been deployed, or is currently being integrated to aid in fault detection will be described. The paper will also explain how IMS can be used to complement a suite of other ISHM tools, providing initial fault detection support for diagnosis and recovery.
This paper describes the initial results of applying two machine-learning-based unsupervised anomaly detection algorithms, Orca and GritBot, to data from two rocket propulsion testbeds. The first testbed uses historical data from the Space Shuttle Main Engine. The second testbed uses data from an experimental rocket engine test stand located at NASA Stennis Space Center. The paper describes four candidate anomalies detected by the two algorithms.
Gradient-based numerical optimization of complex
engineering designs offers the promise of rapidly producing
better designs. However, such methods generally assume
that the objective function and constraint functions are
continuous, smooth, and defined everywhere. Unfortunately,
realistic simulators tend to violate these assumptions,
making optimization unreliable. Several decisions that
need to be made in setting up an optimization, such as
the choice of a starting prototype and the choice of a
formulation of the search space, can make a difference
in the reliability of the optimization. Machine learning
can improve gradient-based methods by making these choices
based on the results of previous optimizations. This paper
demonstrates this idea by using machine learning for four
parts of the optimization setup problem: selecting a starting
prototype from a database of prototypes, synthesizing a
new starting prototype, predicting which design goals are
achievable, and selecting a formulation of the search space.
We use standard tree-induction algorithms (C4.5 and CART).
We present results in two realistic engineering domains:
racing yachts and supersonic aircraft. Our experimental
results show that using inductive learning to make setup
decisions improves both the speed and the reliability of
design optimization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.