Back
Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.
Fawkes: Protecting Privacy against Unauthorized Deep Learning Models
July 23, 2020 -
Paper Link -
Tags: Adversarial, Data-Poisoning, Perturbation
Summary
Created a novel targeted clean data poisoning attack using perturbation ("cloaking") to move User samples towards a Target class. Their system consists of a User, who does not want to recognized by facial recognition software and a Target. The Target helps the User not be recognized. In feature space, the Target should be far away from the User. Their framework then adds a perturbation to User photos (a different perturbation for each photo) in the training dataset to move
the decision boundary for the User closer to the Target. When live, un-perturbed, User samples are then feed into the facial recognition system, the User will be classified as someone else, since their feature vector is not similar to the Target.
Notes
- Constraints:
- Can alter the images of the User
- Has access to the facial recognition model or another model capable of facial recognition.
- Has access to the full dataset (for feature vector distancing from the User). The Target does not need to be in the dataset. The Target is just used as a goal to move the User decision bondary towards
- Does not want to decrease the overall accuracy of the model
- Fawkes
- Choose a Target Class T. Compute the centroid of every class, pick the centroid furthest away from the User's centroid. See Equation 3.
- Use the following formula to calculate per-image cloaks: \( \underset{\delta}{\operatorname{min}} Dist(\Phi(x_{T}), \Phi (x \oplus \delta (x, x_T))) + \lambda max(|\delta (x, x_T)| - \rho, 0) \)
- \( \delta \) is the perturbation filter. It is given the User image and the Target image
- \( \Phi \) is the facial recognition feature generator
- \( \lambda \) is how important the amount of perturbation is
- \( \rho \) is the maximum acceptable amount of perturbation. If \( \rho \) was \( \infty \) then the cloaked image would be identical to the target image.
- Each cloak is generated by selecting a random image in the Target set.
- Tanh is used to keep the pixels within [0, 255], similar to the clipping done by most papers.
- Key Findings
- "Cloaking is highly effective when users share a feature extractor with the tracker", i.e. white-box attack
- Using a robust feature extractor helps limit the adverse effects of a non white-box attack. A robust feature extractor is a feature extractor trained with adversarial images. They used the PGD attack to generate adversarial samples.
- Figure 6, when number of labels is 5>, the results are promising.
- Figure 7, using the same model as the tracker is more effective than using a different feature extractor (facial recognition model)
- Figure 8, robust models make the attack easier
- Sybil Accounts
- A sybil account is an account generated by the User with images similar in feature space to the User
- Used as an additional decision close to the actual User that the User's real photos can be classified into
- Figure 10, protection success rate plummets when a small percentage (15% or so) is uncloaked
- Figure 11, a single sybil account keeps protection success rate high, with two sybil accounts having additional success
- Attempted Countermeasures
- Blurring: decreased overall accuracy but not protection success rate
- Gaussian Noise: same as blurring
- JPEG Compression: both accuracy and protection success rate decreass
Interesting References
Analysis
- This is very similar to a standard perturbation evasion attack, but is marketed as a data poisoning attack instead.
- The attack is VERY dependent on poisoning EVERY User sample in the training dataset. With less than 20% un-poisoned samples, the protection success rate is less than 50%. In the authors' words: "a tracker who obtains a large number of uncloaked images of the user can compromise the effectiveness of Fawkes"
- The Sybil case significantly helps when some percent of the dataset is not-poisoned, as shown in Figure 11. The Sybil attack requires adding an additional label to the dataset however, that is similar to the User. They did not mention this, but I believe the User will likely be labeled as the Sybil account. I just don't like it.
- Works better when there are a lot of labels (65 to 10,575 classes), but when there are few labels, the decision boundary is harder to move
Ideas
- We did the same thing: "Normal classes should have only one feature cluster. To do this, the tracker could run a 2-means clustering on each class's feature space, flagging classes with two distinct centroids as potentially cloaked". "...To reduce the probability of detection by this method, the user can choose a target class that does not create such a large feature space separation."
- Possible defense strategy for us: Increase the number of labels and samples. This way, it is more likely that benign labels have multiple centroids
Citation: Shan, Shawn, et al. "Fawkes: Protecting Privacy against Unauthorized Deep Learning Models." 29th {USENIX} Security Symposium ({USENIX} Security 20). 2020.