Chedy Raïssi scite author profile

Space-based missions such as Kepler, and soon TESS, provide large datasets that must be analyzed efficiently and systematically. Recent work by Shallue & Vanderburg (2018) successfully used stateof-the-art deep learning models to automatically classify Kepler transit signals as either exoplanets or false positives; our application of their model yielded 95.8% accuracy and 95.5% average precision.Here we expand upon that work by including additional scientific domain knowledge into the network architecture and input representations to significantly increase overall model performance to 97.5% accuracy and 98.0% average precision. Notably, we achieve 15-20% gains in recall for the lowest signal-to-noise transits that can correspond to rocky planets in the habitable zone. We input into the network centroid time-series information derived from Kepler data plus key stellar parameters taken from the Kepler DR25 and Gaia DR2 catalogues. We also implement data augmentation techniques to alleviate model over-fitting. These improvements allow us to drastically reduce the size of the model, while still maintaining improved performance; smaller models are better for generalization, for example from Kepler to TESS data. This work illustrates the importance of including expert domain knowledge in even state-of-the-art deep learning models when applying them to scientific research problems that seek to identify weak signals in noisy data. This classification tool will be especially useful for upcoming space-based photometry missions focused on finding small planets, such as TESS and PLATO.

show abstract

Mining Dominant Patterns in the Sky

Soulet

Raïssi

Plantevit

et al. 2011

View full text Add to dashboard Cite

Anytime discovery of a diverse set of patterns with Monte Carlo tree search

Bosc

Boulicaut

Raïssi

et al. 2017

Data Min Knowl Disc

View full text Add to dashboard Cite

The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting patterns from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing approaches make use of beam-search, sampling, and genetic algorithms for discovering a pattern set that is nonredundant and of high quality w.r.t. a pattern quality measure. We argue that such approaches produce pattern sets that lack of diversity: Only few patterns of high quality, and different enough, are discovered. Our main contribution is then to formally define pattern mining as a game and to solve it with Monte Carlo tree search (MCTS). It can be seen as an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property. We show through a comprehensive set of experiments how MCTS enables the anytime discovery of a diverse pattern set of high quality. It outperforms other approaches when dealing with a large pattern search space and for different quality measures. Thanks to its genericity, our MCTS approach can be used for SD but also for many other pattern mining tasks.

show abstract

Mining Statistically Significant Sequential Patterns

Low‐Kam

Raïssi

Kaytoue

et al. 2013

View full text Add to dashboard Cite

Rapid classification of TESS planet candidates with convolutional neural networks

Osborn

Ansdell

Ioannou

et al. 2020

A&A

View full text Add to dashboard Cite

Aims. Accurately and rapidly classifying exoplanet candidates from transit surveys is a goal of growing importance as the data rates from space-based survey missions increases. This is especially true for NASA's TESS mission which generates thousands of new candidates each month. Here we created the first deep learning model capable of classifying TESS planet candidates. Methods. We adapted the neural network model of Ansdell et al. (2018) to TESS data. We then trained and tested this updated model on 4 sectors of high-fidelity, pixel-level simulations data created using the Lilith simulator and processed using the full TESS SPOC pipeline. With the caveat that direct transfer of the model to real data will not perform as accurately, we also applied this model to four sectors of TESS candidates. Results. We find our model performs very well on our simulated data, with 97% average precision and 92% accuracy on planets in the 2-class model. This accuracy is also boosted by another ∼ 4% if planets found at the wrong periods are included. We also performed 3-and 4-class classification of planets, blended & target eclipsing binaries, and non-astrophysical false positives, which have slightly lower average precision and planet accuracies, but are useful for follow-up decisions. When applied to real TESS data, 61% of TCEs coincident with currently published TOIs are recovered as planets, 4% more are suggested to be EBs, and we propose a further 200 TCEs as planet candidates.

show abstract

Anonymizing set-valued data by nonreciprocal recoding

Xue

Karras

Raïssi

et al. 2012

View full text Add to dashboard Cite

Today there is a strong interest in publishing set-valued data in a privacy-preserving manner. Such data associate individuals to sets of values (e.g., preferences, shopping items, symptoms, query logs). In addition, an individual can be associated with a sensitive label (e.g., marital status, religious or political conviction). Anonymizing such data implies ensuring that an adversary should not be able to (1) identify an individual's record, and (2) infer a sensitive label, if such exists. Existing research on this problem either perturbs the data, publishes them in disjoint groups disassociated from their sensitive labels, or generalizes their values by assuming the availability of a generalization hierarchy. In this paper, we propose a novel alternative. Our publication method also puts data in a generalized form, but does not require that published records form disjoint groups and does not assume a hierarchy either; instead, it employs generalized bitmaps and recasts data values in a nonreciprocal manner; formally, the bipartite graph from original to anonymized records does not have to be composed of disjoint complete subgraphs. We configure our schemes to provide popular privacy guarantees while resisting attacks proposed in recent research, and demonstrate experimentally that we gain a clear utility advantage over the previous state of the art.

show abstract

Watch me playing, i am a professional

et al. 2012

View full text Add to dashboard Cite

On measuring similarity for sequences of itemsets

Egho

Raïssi

Calders

et al. 2014

Data Min Knowl Disc

View full text Add to dashboard Cite

International audienceComputing the similarity between sequences is a very important challenge for many different data mining tasks. There is a plethora of similarity measures for sequences in the literature, most of them being designed for sequences of items. In this work, we study the problem of measuring the similarity between sequences of itemsets. We focus on the notion of common subsequences as a way to measure similarity between a pair of sequences composed of a list of itemsets. We present new combinatorial results for efficiently counting distinct and common subsequences. These theoretical results are the cornerstone of an effective dynamic programming approach to deal with this problem. In addition, we propose an approximate method to speed up the computation process for long sequences. We have applied our method to various data sets: healthcare trajectories, online handwritten characters and synthetic Responsible editors: 123 Elias Egho et al. data. Our results confirm that our measure of similarity produces competitive scores and indicate that our method is relevant for large scale sequential data analysis

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chedy Raïssi

Scientific Domain Knowledge Improves Exoplanet Transit Classification with Deep Learning

Mining Dominant Patterns in the Sky

Anytime discovery of a diverse set of patterns with Monte Carlo tree search

Mining Statistically Significant Sequential Patterns

Rapid classification of TESS planet candidates with convolutional neural networks

Anonymizing set-valued data by nonreciprocal recoding

Watch me playing, i am a professional

On measuring similarity for sequences of itemsets

Contact Info

Product

Resources

About