International⭐ Featured

A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Learning from Incorrectly Labeled Data

Section 3.2 of Ilyas et al. (2019) shows that training a model on only adversarial errors leads to non-trivial generalization on the original test set. We show that these experiments are a specific case of learning from errors.

6 April 2026 at 06:25 pm

1 views

A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Learning from Incorrectly Labeled Data

In recent years, the field of machine learning has been grappling with the concept of adversarial examples, which are inputs specifically crafted to mislead models. Traditionally, these examples have been viewed as bugs or vulnerabilities in machine learning systems. However, a groundbreaking study by Ilyas et al. (2019) challenges this perspective, proposing that adversarial examples are not merely bugs but rather features that can be leveraged for learning.

The paper delves into the intriguing phenomenon of adversarial examples, which are typically generated by adding imperceptible perturbations to input data. These perturbations cause models to misclassify the data, leading to a significant drop in performance. Traditional approaches to addressing adversarial examples have focused on improving model robustness through techniques such as adversarial training, which involves augmenting the training data with adversarial examples.

However, Ilyas et al. (2019) take a different approach in their exploration of adversarial examples. In Section 3.2 of their paper, they demonstrate that training a model solely on adversarial errors can lead to non-trivial generalization on the original test set. This finding challenges the conventional understanding of adversarial examples as detrimental to model performance.

The authors argue that adversarial examples are not just bugs but rather a manifestation of the complexities inherent in the data itself. By training models on adversarial errors, they are essentially learning from incorrectly labeled data, a concept that has been explored in the broader field of learning from errors.

Learning from errors refers to the process of using incorrect predictions or labels to improve the performance of machine learning models. This approach is not new, but Ilyas et al. (2019) provide a novel perspective by applying it to adversarial examples. Their experiments show that models trained on adversarial errors can generalize well to the original test set, suggesting that adversarial examples contain valuable information about the underlying data distribution.

The key insight from Ilyas et al. (2019) is that adversarial examples are not random noise but rather structured perturbations that reveal the weaknesses of a model. By training on these adversarial examples, models can learn to identify and correct these weaknesses, leading to improved generalization.

This perspective shifts the focus from viewing adversarial examples as threats to recognizing them as a tool for enhancing model robustness. It also highlights the importance of understanding the nature of adversarial examples and their relationship to the data.

The study by Ilyas et al. (2019) has significant implications for the field of machine learning. It challenges the traditional view of adversarial examples as bugs and encourages researchers to explore new ways of leveraging them for learning. By reframing adversarial examples as features, the paper opens up avenues for developing more robust and generalizable models.

Furthermore, this approach can have broader implications beyond adversarial examples. The concept of learning from errors, as demonstrated in the paper, can be applied to other scenarios where models encounter incorrect or noisy data. By harnessing these errors, models can potentially improve their performance and adaptability.

In conclusion, Ilyas et al. (2019) have redefined the understanding of adversarial examples in machine learning. By showing that training on adversarial errors can lead to non-trivial generalization, they have demonstrated that adversarial examples are not just bugs but valuable features for learning. This perspective not only challenges conventional wisdom but also offers a new direction for improving model robustness and generalization. As research in this area continues, it will be fascinating to see how adversarial examples are further integrated into the learning process, leading to more resilient and intelligent machine learning systems.

Source: Distill