Towards Robust Deep Neural Networks: Query Efficient Black-Box Adversarial Attacks and Defences

Vo, Quoc Viet

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/140135

Type:	Thesis
Title:	Towards Robust Deep Neural Networks: Query Efficient Black-Box Adversarial Attacks and Defences
Author:	Vo, Quoc Viet
Issue Date:	2023
School/Discipline:	School of Computer and Mathematical Sciences
Abstract:	Deep neural networks (DNNs) have been recognized for their remarkable ability to achieve state-of-the-art performance across numerous machine learning tasks. However, DNN models are susceptible to attacks in the deployment phase, where Adversarial Examples (AEs) present significant threats. Generally, in the Computer Vision domain, adversarial examples are maliciously modified inputs that look similar to the original input and are constructed under white-box settings by adversaries with full knowledge and access to a victim model. But, recent studies have shown the ability to extract information solely from the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems. This is significant because of the growing numbers of Machine Learning as a Service (MLaaS) providers—including Google, Microsoft, IBM—and applications incorporating these models. Therefore, this dissertation studies the weaknesses of DNNs to attacks in black-box settings and seeks to develop mechanisms that can defend DNNs against these attacks. Recognising the practical ability of adversaries to exploit simply the classification decision (predicted label) from a trained model’s access interface distinguished as a decision-based attack, the research in Chapter 3 first delves into recent state-of-the-art decision-based attacks employing approximate gradient estimation or random search methods. These attacks aim at discovering lp>0 constraint adversarial instances, dubbed dense attacks. The research then develops a robust class of query efficient attacks capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The proposed attack method—RAMBOATTACK—exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the entrapment issues in local minima encountered by gradient estimation methods. In contrast to dense attacks, recent studies have realised lp=0 constraint adversarial instances, dubbed sparse attacks in white-box settings. This demonstrates that machine learning models are more vulnerable than we believe. However, these sparse attacks in the most challenging scenario—decision-based—have not been well studied. Furthermore, the sparse attacks aim to minimize the number of perturbed pixels—measured by l0 norm—leads to i) an NP-hard problem; and ii) a non-differentiable search space. Recognizing the shortage of study about sparse attacks in a decision-based setting and challenges of NP-hard problem and non-differential search space, the research in Chapter 4 explores decision-based spare attacks and develops an evolution-based algorithm—SPARSEEVO—for handling these challenges. The results of comprehensive experiments in this research show that SPARSEEVO requires significantly fewer model queries than the state-of-the-art sparse attack for both untargeted and targeted attacks. Importantly, the query efficient SPARSEEVO, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models. Extracting information solely from the confidence score of a machine learning model can considerably reduce the required query budgets to attack a victim model. But similar to sparse attacks in decision-based settings, constructing sparse adversarial attacks, even when models opt to serve confidence score information to queries, is non-trivial because of the resulting NP-hard problem and the non-differentiable search space. To this end, the study in Chapter 5 develops the BRUSLEATTACK—a new algorithm built upon a Bayesian framework for the problem and evaluates against Convolutional Neural Networks, Vision Transformers, recent Stylized ImageNet models, defense methods and Machine Learning as a Service (MLaaS) offerings exemplified by Google Cloud Vision. Through extensive experiments, the proposed attack achieves state-of-the-art attack success rates and query efficiency on standard computer vision tasks across various models. Understanding and recognizing the vulnerability of Deep Learning models to adversarial attacks in various black-box scenarios has compelled the exploration of mechanisms to defend Deep Learning models. Therefore, the research in Chapter 6 explores different defense approaches and proposes a more effective mechanism to defend against black-box attacks. Particularly, the research aims to integrate uncertainty into model outputs to mislead black-box attacks by randomly selecting a single or a subset of well-trained models to make predictions to query inputs. The uncertainty in the output scores to sequences of queries is able to hamper the attempt of attack algorithms at estimating gradients or searching directions toward an adversarial example. Since the uncertainty in the output scores can be improved through the diversity of a model set, the research investigates different techniques to promote model diversity. Through comprehensive experiments, the research demonstrates that the Stein Variational Gradient Descent method with a novel sample loss objective encourages greater diversity than others. Overall, both introducing uncertainty into the output scores and prompting diversity of the model set studied in this research is able to greatly enhance the defense capability against black-box attacks with minimal impact on model performance.
Advisor:	Ranasinghe, Damith Chinthana Abbasnejad, Ehsan
Dissertation Note:	Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2023
Keywords:	Machine learning robustness adversarial attacks and defences Trustworthy machine learning AI safety Trustworthy artificial intelligence system
Provenance:	This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:	Research Theses

Files in This Item:

File	Description	Size	Format
Vo2023_PhD.pdf		45.91 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship