Sparse Feature Attacks in Adversarial Learning

Zhizhou Yin, Fei Wang, Wei Liu*, Sanjay Chawla

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)

Abstract

Adversarial learning is the study of machine learning techniques deployed in non-benign environments. Example applications include classification for detecting spam, network intrusion detection, and credit card scoring. In fact, as the use of machine learning grows in diverse application domains, the possibility for adversarial behavior is likely to increase. When adversarial learning is modelled in a game-theoretic setup, the standard assumption about the adversary (player) behavior is the ability to change all features of the classifiers (the opponent player) at will. The adversary pays a cost proportional to the size of the 'attack'. We refer to this form of adversarial behavior as a dense feature attack. However, the aim of an adversary is not just to subvert a classifier but carry out data transformation in a way such that spam continues to remain effective. We demonstrate that an adversary could potentially achieve this objective by carrying out a sparse feature attack. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use ℓ1 regularizers.

Original languageEnglish
Pages (from-to)1164-1177
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume30
Issue number6
DOIs
Publication statusPublished - 1 Jun 2018

Keywords

  • Adversarial learning
  • nash equilibrium
  • sparse modeling
  • stackelberg game
  • ℓ regularizer

Fingerprint

Dive into the research topics of 'Sparse Feature Attacks in Adversarial Learning'. Together they form a unique fingerprint.

Cite this