Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.

Fawkes: Protecting Privacy against Unauthorized Deep Learning Models

July 23, 2020 - Paper Link - Tags: Adversarial, Data-Poisoning, Perturbation

Summary

Created a novel targeted clean data poisoning attack using perturbation ("cloaking") to move User samples towards a Target class. Their system consists of a User, who does not want to recognized by facial recognition software and a Target. The Target helps the User not be recognized. In feature space, the Target should be far away from the User. Their framework then adds a perturbation to User photos (a different perturbation for each photo) in the training dataset to move the decision boundary for the User closer to the Target. When live, un-perturbed, User samples are then feed into the facial recognition system, the User will be classified as someone else, since their feature vector is not similar to the Target.

Notes

Constraints:

Can alter the images of the User
Has access to the facial recognition model or another model capable of facial recognition.
Has access to the full dataset (for feature vector distancing from the User). The Target does not need to be in the dataset. The Target is just used as a goal to move the User decision bondary towards
Does not want to decrease the overall accuracy of the model

Fawkes

Choose a Target Class T. Compute the centroid of every class, pick the centroid furthest away from the User's centroid. See Equation 3.
Use the following formula to calculate per-image cloaks: \( \underset{\delta}{\operatorname{min}} Dist(\Phi(x_{T}), \Phi (x \oplus \delta (x, x_T))) + \lambda max(|\delta (x, x_T)| - \rho, 0) \)

\( \delta \) is the perturbation filter. It is given the User image and the Target image
\( \Phi \) is the facial recognition feature generator
\( \lambda \) is how important the amount of perturbation is
\( \rho \) is the maximum acceptable amount of perturbation. If \( \rho \) was \( \infty \) then the cloaked image would be identical to the target image.
Each cloak is generated by selecting a random image in the Target set.
Tanh is used to keep the pixels within [0, 255], similar to the clipping done by most papers.

Key Findings

"Cloaking is highly effective when users share a feature extractor with the tracker", i.e. white-box attack
Using a robust feature extractor helps limit the adverse effects of a non white-box attack. A robust feature extractor is a feature extractor trained with adversarial images. They used the PGD attack to generate adversarial samples.

Figure 6, when number of labels is 5>, the results are promising.
Figure 7, using the same model as the tracker is more effective than using a different feature extractor (facial recognition model)
Figure 8, robust models make the attack easier
Sybil Accounts

A sybil account is an account generated by the User with images similar in feature space to the User
Used as an additional decision close to the actual User that the User's real photos can be classified into
Figure 10, protection success rate plummets when a small percentage (15% or so) is uncloaked
Figure 11, a single sybil account keeps protection success rate high, with two sybil accounts having additional success

Attempted Countermeasures

Blurring: decreased overall accuracy but not protection success rate
Gaussian Noise: same as blurring
JPEG Compression: both accuracy and protection success rate decreass

Interesting References

"Clean label attacks are easily detectable through anomaly detection in the feature space"
"...transfer learning is most effective when the feature extractor and model are trained on similar datasets."
DSSIM (Structural Dis-Similarity Index) can be used as a mathematical way to determine the "visual" difference between images. A DSSIM up to 0.2 is imperceptible to the human eye.

Analysis

This is very similar to a standard perturbation evasion attack, but is marketed as a data poisoning attack instead.
The attack is VERY dependent on poisoning EVERY User sample in the training dataset. With less than 20% un-poisoned samples, the protection success rate is less than 50%. In the authors' words: "a tracker who obtains a large number of uncloaked images of the user can compromise the effectiveness of Fawkes"
The Sybil case significantly helps when some percent of the dataset is not-poisoned, as shown in Figure 11. The Sybil attack requires adding an additional label to the dataset however, that is similar to the User. They did not mention this, but I believe the User will likely be labeled as the Sybil account. I just don't like it.
Works better when there are a lot of labels (65 to 10,575 classes), but when there are few labels, the decision boundary is harder to move

Ideas

We did the same thing: "Normal classes should have only one feature cluster. To do this, the tracker could run a 2-means clustering on each class's feature space, flagging classes with two distinct centroids as potentially cloaked". "...To reduce the probability of detection by this method, the user can choose a target class that does not create such a large feature space separation."

Possible defense strategy for us: Increase the number of labels and samples. This way, it is more likely that benign labels have multiple centroids

Citation: Shan, Shawn, et al. "Fawkes: Protecting Privacy against Unauthorized Deep Learning Models." 29th {USENIX} Security Symposium ({USENIX} Security 20). 2020.