Extracting automata from recurrent neural networks using queries and counterexamples (extended version)

Weiß, Gabriele; Goldberg, Yoav; Yahav, Eran

doi:10.1007/s10994-022-06163-2

Cited by 16 publications

(23 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To generate a state diagram from an RNN model, we develop a method that clusters semantically related hidden states of the RNN model into an abstract state. Our work is inspired by the modelbased analysis of stateful RNNs [15,16,47,51,61]. These works apply various techniques to extract interpretable state transition…”

Section: Design and Implementation 51 State Abstractionmentioning

confidence: 99%

DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction

Wang

Huang

Song

et al. 2023

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Section: Design and Implementation 51 State Abstractionmentioning

confidence: 99%

DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction

Wang

Huang

Song

et al. 2023

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

“…On a more applied note, the MQ+EQ model has recently been used for recurrent and binarized neural networks (Weiss et al, 2018(Weiss et al, , 2019Okudono et al, 2020;Shih et al, 2019), and interpretability (Camacho and McIlraith, 2019). It is also worth noting that the MQ learning model has been criticized by the applied machine learning community, as labels can be queried in the whole input space, irrespective of the distribution that generates the data.…”

Section: Learning With Queriesmentioning

confidence: 99%

Sample Complexity Bounds for Robustly Learning Decision Lists against Evasion Attacks

Hintersdorf

Struppek

Kersting

2022

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model. Knowing this may indeed lead to a privacy breach. Most MIAs, however, make use of the model's prediction scores - the probability of each output given some input - following the intuition that the trained model tends to behave differently on its training data. We argue that this is a fallacy for many modern deep network architectures. Consequently, MIAs will miserably fail since overconfidence leads to high false-positive rates not only on known domains but also on out-of-distribution data and implicitly acts as a defense against MIAs. Specifically, using generative adversarial networks, we are able to produce a potentially infinite number of samples falsely classified as part of the training data. In other words, the threat of MIAs is overestimated, and less information is leaked than previously assumed. Moreover, there is actually a trade-off between the overconfidence of models and their susceptibility to MIAs: the more classifiers know when they do not know, making low confidence predictions, the more they reveal the training data.

show abstract

“…Another possibility is a hypothesis class again consisting of a large key-value memory, but now coupled with a small RASP program [WGY21] or other textual program. Yet another possibility is a target class consisting of a large logical circuit where a significant fraction of the nodes correspond to "human-understandable" concepts (e.g., words from a dictionary, or concepts encoded by a more trusted LLM).…”

Section: Extensions and Future Directionsmentioning

confidence: 99%