Alice Coucke scite author profile

Motivation: Graphical models are often employed to interpret patterns of correlations observed in data through a network of interactions between the variables. Recently, Ising/Potts models, also known as Markov random fields, have been productively applied to diverse problems in biology, including the prediction of structural contacts from protein sequence data and the description of neural activity patterns. However, inference of such models is a challenging computational problem that cannot be solved exactly. Here we describe the adaptive cluster expansion (ACE) method to quickly and accurately infer Ising or Potts models based on correlation data. ACE avoids overfitting by constructing a sparse network of interactions sufficient to reproduce the observed correlation data within the statistical error expected due to finite sampling. When convergence of the ACE algorithm is slow, we combine it with a Boltzmann Machine Learning algorithm (BML). We illustrate this method on a variety of biological and artificial data sets and compare it to state-of-the-art approximate methods such as Gaussian and pseudo-likelihood inference. Results: We show that ACE accurately reproduces the true parameters of the underlying model when they are known, and yields accurate statistical descriptions of both biological and artificial data. Models inferred by ACE have substantially better statistical performance compared to those obtained from faster Gaussian and pseudo-likelihood methods, which only precisely recover the structure of the interaction network. Availability: The ACE source code, user manual, and tutorials with example data are freely available on GitHub at https://github.com/johnbarton/ACE. Contacts:

show abstract

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

Coucke¹,

Saade²,

Ball³

et al. 2018

Preprint

131

198

View full text Add to dashboard Cite

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices. The embedded inference is fast and accurate while enforcing privacy by design, as no personal user data is ever collected. Focusing on Automatic Speech Recognition and Natural Language Understanding, we detail our approach to training high-performance Machine Learning models that are small enough to run in real-time on small devices. Additionally, we describe a data generation procedure that provides sufficient, high-quality training data without compromising user privacy.1 https://www.voicebot.ai/2018/03/07/new-voicebot-report-says-nearly-20-u-sadults-smart-speakers/ 2 In French: https://www.cnil.fr/fr/enceintes-intelligentes-des-assistants-vocauxconnectes-votre-vie-privee 3 https://www.eugdpr.org/

show abstract

ACE: adaptive cluster expansion for maximum entropy graphical model inference

et al. 2016

View full text Add to dashboard Cite

show abstract

Federated Learning for Keyword Spotting

et al. 2019

View full text Add to dashboard Cite

We propose a practical approach based on federated learning to solve out-of-domain issues with continuously running embedded speech-based models such as wake word detectors. We conduct an extensive empirical study of the federated averaging algorithm for the "Hey Snips" wake word based on a crowdsourced dataset that mimics a federation of wake word users. We empirically demonstrate that using an adaptive averaging strategy inspired from Adam in place of standard weighted model averaging highly reduces the number of communication rounds required to reach our target performance. The associated upstream communication costs per user are estimated at 8 MB, which is a reasonable in the context of smart home voice assistants. Additionally, the dataset used for these experiments is being open sourced with the aim of fostering further transparent research in the application of federated learning to speech data.

show abstract

Efficient Keyword Spotting Using Dilated Convolutions and Gating

Coucke

Chlieh

Gisselbrecht

et al. 2019

View full text Add to dashboard Cite

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset -"Hey Snips" utterances recorded by over 2.2K different speakershas been made publicly available to establish an open reference for wake-word detection.

show abstract

Spoken Language Understanding on the Edge

Saade¹,

Dureau²,

Luque³

et al. 2018

Preprint

View full text Add to dashboard Cite

We consider the problem of performing Spoken Language Understanding (SLU) on small devices typical of IoT applications. Our contribution is two-fold. First, we outline the design of an embedded, private-by-design SLU system and show that it has performance on-par with cloud-based commercial solutions. Second, we release the datasets used in our experiments in the interest of reproducibility and in the hope that they can prove useful to the community.

show abstract

Direct coevolutionary couplings reflect biophysical residue interactions in proteins

Coucke

Uguzzoni

Oteri

et al. 2016

View full text Add to dashboard Cite

Coevolution of residues in contact imposes strong statistical constraints on the sequence variability between homologous proteins. Direct-Coupling Analysis (DCA), a global statistical inference method, successfully models this variability across homologous protein families to infer structural information about proteins. For each residue pair, DCA infers 21 × 21 matrices describing the coevolutionary coupling for each pair of amino acids (or gaps). To achieve the residue-residue contact prediction, these matrices are mapped onto simple scalar parameters; the full information they contain gets lost. Here, we perform a detailed spectral analysis of the coupling matrices resulting from 70 protein families, to show that they contain quantitative information about the physico-chemical properties of amino-acid interactions. Results for protein families are corroborated by the analysis of synthetic data from lattice-protein models, which emphasizes the critical effect of sampling quality and regularization on the biochemical features of the statistical coupling matrices.

show abstract

Federated Learning for Keyword Spotting

Luque¹,

Coucke²,

Lavril³

et al. 2018

Preprint

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alice Coucke

ACE: adaptive cluster expansion for maximum entropy graphical model inference

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

ACE: adaptive cluster expansion for maximum entropy graphical model inference

Federated Learning for Keyword Spotting

Efficient Keyword Spotting Using Dilated Convolutions and Gating

Spoken Language Understanding on the Edge

Direct coevolutionary couplings reflect biophysical residue interactions in proteins

Federated Learning for Keyword Spotting

Contact Info

Product

Resources

About