Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.

Exploring Connections Between Active Learning and Model Extraction

Nov. 20, 2019 - Paper Link - Tags: Model-Extraction

Summary

Formalized model extraction and possible defense strategies. Drew parallels between model extraction and established area of active learning. Looked into Machine Learning-as-a-Service (MLaaS) and the associated costs of reverse engineering a model.

Notes

Section 2.1 defines passive learning. Section 2.2 defines active learning
PAC (probability approximately correct) learning has access to a distribution of the dataset. They then query the model to generate the hypothesis \(\hat{f}\) such that the expected loss from the dataset is low.
The active learning version of PAC selects items from the dataset.

Stream-Based Sampling: Select samples sequentially, once you pass a sample, you don't go back to it
Pool-Based Sampling: Select samples from a pool of samples (better).

Query Synthesis Active Learning: Pool based sampling but can generate samples independent of the given dataset distribution (Most realistic for model extraction)
Active learning needs to be able to evaluate the usefulness of an unlabeled instance.

Interesting References

Modifies input features resulting in misclassification by DNNs (data poisoning) - LINK
Looks at: mislabeling training data, changing rewards in the case of reinforcement learning, and modifying the samples mechanisms (data poisoning paper) - LINK

Citation: Chandrasekaran, Varun, et al. "Exploring connections between active learning and model extraction." 29th {USENIX} Security Symposium ({USENIX} Security 20). 2020.