Please use this identifier to cite or link to this item:
|Title:||Differentially Private Data Analytics|
|School/Discipline:||School of Computer Science|
|Abstract:||With the emergence of smart devices and data-driven applications, personal data are being dramatically generated, gathered and used by modern systems for data analytics in a wide range of customised service applications. Despite the advantages of data analytics, potential risks arise that allow adversaries to infer individuals’ private information by using some auxiliary information. Therefore, it is crucial to develop new methods and techniques for privacy-preserving data analytics that ensure acceptable trade-offs between privacy and utility. Over the last decade, differential privacy (DP) has been viewed as a promising notion of privacy because it mathematically bounds the trade-off between privacy and utility against adversaries’ strong inference attacks (n−1 out of n items in the input). By exploring the latest results of differentially private data analytics, this thesis concentrates on four sub-topics: differentially private data aggregation with security guarantees, privacy budget guarantees in distributed systems, differentially private single-path publishing and differentially private k-means clustering. For differentially private data aggregation with security guarantees, we propose a twolayer data aggregation approach against semi-honest but colluding data aggregators, where DP noise is randomly injected. We address the problems of the collusion of data curators and data aggregators. The key idea of our approach is injecting DP noise randomly to prevent privacy disclosure from collusion, while maintaining a high degree of utility and splitting and sharing data pieces to guarantee security. The experimental evaluations over synthetic datasets confirm our mathematical analysis results and show that our approach achieves enhanced aggregation utility. For privacy budget guarantees in distributed systems, we study the parallel composition of privacy budget in differentially private data aggregation scenarios. We propose a new lower bound of the parallel composition of privacy budgets. We address two problems of the state-of-the-art: poor utility when using global sensitivity for both data curators and data aggregators and unknown privacy guarantees with conditions on the local sensitivities between data curators and data aggregators. The key is taking a property of the summation function: the local sensitivity of summation for a dataset is equal to the maximum summation of any sub-dataset. The experimental results over a real-life dataset support our theoretical results of the proposed lower bound of the unconditional parallel composition of privacy budget. For differentially private single-path publishing, we propose a graph-based single-path publishing approach with DP guarantees. We address two problems in existing work: information loss regarding the exact path for genuine users and no privacy guarantees for edges when adversaries have information about all vertices, yet one edge is missing. The main idea is adding fake vertices and edges into the real path by DP, and then hiding connections in the perturbed path in the topology of the published graph so that only the trusted path users with full knowledge about the map can recover the real path. The experimental evaluations of synthetic datasets corroborate the theoretical properties of our approach. For differentially private k-means clustering, we propose a convergent differentially private k-means algorithm that addresses the non-convergence problem of existing work. The key idea is that, at each iteration, we sample a centroid for the next iteration from a specially defined area in each cluster with a selected orientation to guarantee both convergence and convergence rates. Such an orientation is determined by either past centroid movements or past plus future centroid movements. Both mathematical and experimental evaluations show that, because of the convergence, our approach achieves enhanced clustering quality over the state-of-the-art of DP k-means clustering, while having the same DP guarantees.|
|Dissertation Note:||Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2021|
|Provenance:||This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals|
|Appears in Collections:||Research Theses|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.