025 Icml 2018 Paper Review Model Being Susceptible to Adversarial Samples Might Just Be a Misconception

025 ICML 2018 Paper Review - Model Being Susceptible to Adversarial Samples Might Just Be a Misconception #

The 35th International Conference on Machine Learning (ICML 2018) was held in Stockholm, Sweden, from July 10th to 15th, 2018.

ICML has been held since 1980 and has a history of over 30 years. It is a top conference in the field of machine learning and artificial intelligence.

This year, ICML received a total of 2473 submissions, which is a 45% increase compared to last year. A total of 621 papers were accepted, with an acceptance rate of nearly 25%. In addition to the main conference, ICML also organized 9 workshops and 67 tutorials.

In the following issues, I will select three papers from ICML 2018 for in-depth discussion.

Today, I will share with you the best paper of the conference, titled “Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples”.

Let’s briefly introduce the authors of this paper.

The first author, Anish Athalye, is a Ph.D. student at MIT, focusing on the security of machine learning algorithms. He has published three papers at this year’s ICML conference.

The second author, Nickolai Zeldovich, is Athalye’s advisor. He is a professor in the Computer Science department at MIT, conducting research in the field of security.

The third author, David Wagner, is from the University of California, Berkeley. He is a professor in the Computer Science department and an expert in security research.

Background of the Paper #

The content of this paper may be unfamiliar to most people. To understand the main contributions of this paper, let’s first familiarize ourselves with the problem the paper aims to address.

Let’s consider a supervised learning task, which is a more familiar scenario. Typically, in supervised learning tasks, we have a dataset represented by various features. For example, in a common supervised learning task such as image classification, machine learning algorithms learn a classifier that can make classification decisions based on different input information.

Of course, what we have described so far is using classifiers in normal scenarios. There is a special type of application scenario, or “adversarial” scenario, where the aim is to disrupt or bypass the decision results of classifiers using any means possible.

One category of “adversarial mechanism” is attempting to use “adversarial examples”. What are adversarial examples? They are data samples that are very similar to the original normal samples, but can cause significant differences in classification decisions. For example, in our previous example of image recognition, an effective adversarial example would be an image that looks very much like a dog but can make the classifier believe it is a cat or another animal. By using such similar samples, the training and testing of classifiers can be biased, thereby achieving the goal of attacking the classifier.

In addition to the concept of “adversarial examples,” let’s take a closer look at some basic patterns of attacking classifiers.

Generally speaking, there are two modes of attacks on classifiers: “white-box attacks” and “black-box attacks.” White-box attacks refer to scenarios where attackers have complete access to all internal details of the classifier, such as the architecture of deep models and various weights, but not to the test data. On the other hand, black-box attacks refer to scenarios where attackers do not have access to the details of the classifier.

This paper focuses on white-box attacks. The attacker attempts to find the closest data deformation for each legitimate data point that causes a change in the classifier’s result. In simpler terms, the goal is to make minimal changes to the data to decrease the accuracy of the classifier.

In the context of complete white-box scenarios, there have been recent works aiming to make neural networks more robust in order to resist attacks from adversarial examples. However, the academic community has not yet provided a complete answer to this challenge.

Main Contributions of the Paper #

From the introduction above, we can see that there are currently some methods for defending against adversarial examples, which seem to provide some robustness protection for classifiers. An important contribution of this paper is to point out that these defense methods may only create an illusion caused by “obfuscated gradients”.

Obfuscated gradients are a special form of gradient masking. For iterative attack methods, if gradient obfuscation occurs, the defense side will form a false belief of defense success.

In this paper, the authors analyze the concept of gradient obfuscation and propose three types of gradient obfuscation: “shattered gradients,” “stochastic gradients,” and “vanishing/exploding gradients.”

For these three different types of gradient obfuscation, the authors propose corresponding attack strategies that allow attackers to bypass gradient obfuscation to achieve their goals. The authors demonstrate good results on the ICLR 2018 dataset.

It is worth noting that this paper addresses the problem of gradient obfuscation caused by methods used by the “defense side” during the defense process. Currently, there are also corresponding works in academia that start from the perspective of the attacker, attempting to learn how to break gradient descent, such as making gradients point in the wrong direction.

Core Methods of the Paper #

First, let’s take a look at these three types of gradient obfuscation.

Diffusion gradient mainly refers to the situation where the defense side encounters “non-differentiable” conditions. The consequence of non-differentiability is directly related to numerical instability or non-existence of gradients. Diffusion gradient does not necessarily mean that the defense side intentionally intends to do so; it is very likely that the defense side introduces some seemingly differentiable situations but does not optimize the objective function.

Random gradient is mainly caused by randomized defense. This may be because the neural network itself has been randomized, or the input data has been randomized, resulting in gradient randomization.

Vanishing gradient and exploding gradient are primarily caused by multiple iterations of neural network evaluation. For example, allowing the result of one iteration to directly enter the input of the next iteration.

Just now we mentioned that gradient obfuscation may be an unintentional result of the defense side, not designed for it. So, what methods do the attackers have to identify whether the defense side has produced effective defense or only gradient obfuscation?

The authors have summarized that if the following situations occur, it may mean that gradient obfuscation has occurred.

The first situation is that the effect of one-step attacks is better than iterative attacks. In the case of white-box attacks, iterative attacks are always better than one-step attacks. Therefore, if the situation where one-step attacks are better than iterative attacks occurs, it often indicates an anomaly.

The second situation is that the effect of black-box attacks is better than white-box attacks. Theoretically, the effect of white-box attacks should be better than black-box attacks. The opposite situation often indicates abnormality.

The third situation is that the effect of unbounded attacks does not reach 100%. This last situation is about randomly searching for adversarial samples and finding adversarial samples that are better than gradient descent-based attacks.

So, what methods do attackers have for gradient obfuscation?

For diffusion gradient, the authors propose a method called Backward Pass Differentiable Approximation (BPDA). If you are interested, I recommend reading the paper to understand the details of this algorithm. In general, BPDA is to find the non-differentiable parts of the neural network, and approximate them using simple differentiable functions to bypass the obstacles.

For random gradient, the authors propose the method of Expectation over Transformation. The characteristic of this method is that for random transformations, the expectation of the transformation should still reflect the true gradient information. So, the attackers can act on the expected value of the transformation to effectively estimate the gradient.

For vanishing or exploding gradients, the authors introduce the technique of Reparameterization. Reparameterization is an important technique in deep learning. Here, the authors use reparameterization to transform variables in a way that avoids gradient vanishing or exploding situations.

Summary #

Today I have talked to you about the best paper at ICML this year.

Let’s recap the key points together: First, the paper discusses a relatively unfamiliar topic, and we briefly introduced the background of the paper; Second, we provided a detailed introduction to the three types of gradient obfuscation proposed in the paper.

Finally, I’ll leave you with a question to ponder: Why do we need to study whether deep learning models are robust and can withstand attacks? Does it have any practical significance?