Differentially Private Data Analytics

Lu, Zhigang

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/129873

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Shen, Hong	-
dc.contributor.advisor	Sheng, Michael	-
dc.contributor.author	Lu, Zhigang	-
dc.date.issued	2021	-
dc.identifier.uri	http://hdl.handle.net/2440/129873	-
dc.description.abstract	With the emergence of smart devices and data-driven applications, personal data are being dramatically generated, gathered and used by modern systems for data analytics in a wide range of customised service applications. Despite the advantages of data analytics, potential risks arise that allow adversaries to infer individuals’ private information by using some auxiliary information. Therefore, it is crucial to develop new methods and techniques for privacy-preserving data analytics that ensure acceptable trade-offs between privacy and utility. Over the last decade, differential privacy (DP) has been viewed as a promising notion of privacy because it mathematically bounds the trade-off between privacy and utility against adversaries’ strong inference attacks (n−1 out of n items in the input). By exploring the latest results of differentially private data analytics, this thesis concentrates on four sub-topics: differentially private data aggregation with security guarantees, privacy budget guarantees in distributed systems, differentially private single-path publishing and differentially private k-means clustering. For differentially private data aggregation with security guarantees, we propose a twolayer data aggregation approach against semi-honest but colluding data aggregators, where DP noise is randomly injected. We address the problems of the collusion of data curators and data aggregators. The key idea of our approach is injecting DP noise randomly to prevent privacy disclosure from collusion, while maintaining a high degree of utility and splitting and sharing data pieces to guarantee security. The experimental evaluations over synthetic datasets confirm our mathematical analysis results and show that our approach achieves enhanced aggregation utility. For privacy budget guarantees in distributed systems, we study the parallel composition of privacy budget in differentially private data aggregation scenarios. We propose a new lower bound of the parallel composition of privacy budgets. We address two problems of the state-of-the-art: poor utility when using global sensitivity for both data curators and data aggregators and unknown privacy guarantees with conditions on the local sensitivities between data curators and data aggregators. The key is taking a property of the summation function: the local sensitivity of summation for a dataset is equal to the maximum summation of any sub-dataset. The experimental results over a real-life dataset support our theoretical results of the proposed lower bound of the unconditional parallel composition of privacy budget. For differentially private single-path publishing, we propose a graph-based single-path publishing approach with DP guarantees. We address two problems in existing work: information loss regarding the exact path for genuine users and no privacy guarantees for edges when adversaries have information about all vertices, yet one edge is missing. The main idea is adding fake vertices and edges into the real path by DP, and then hiding connections in the perturbed path in the topology of the published graph so that only the trusted path users with full knowledge about the map can recover the real path. The experimental evaluations of synthetic datasets corroborate the theoretical properties of our approach. For differentially private k-means clustering, we propose a convergent differentially private k-means algorithm that addresses the non-convergence problem of existing work. The key idea is that, at each iteration, we sample a centroid for the next iteration from a specially defined area in each cluster with a selected orientation to guarantee both convergence and convergence rates. Such an orientation is determined by either past centroid movements or past plus future centroid movements. Both mathematical and experimental evaluations show that, because of the convergence, our approach achieves enhanced clustering quality over the state-of-the-art of DP k-means clustering, while having the same DP guarantees.	en
dc.language.iso	en	en
dc.subject	Differential privacy	en
dc.subject	data analytics	en
dc.subject	data aggregation	en
dc.subject	trajectory publishing	en
dc.subject	clustering	en
dc.title	Differentially Private Data Analytics	en
dc.type	Thesis	en
dc.contributor.school	School of Computer Science	en
dc.provenance	This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals	en
dc.description.dissertation	Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2021	en
Appears in Collections:	Research Theses

Files in This Item:

File	Description	Size	Format
Lu2021_PhD.pdf		2.31 MB	Adobe PDF	View/Open

Show simple item record

Adelaide Research & Scholarship