Priyanka Nanayakkara, Hyeok Kim, Yifan Wu, Ali Sarvghad, Narges Mahyar, Gerome Miklau, Jessica Hullman
IEEE Symposium on Security & Privacy (SP)
We propose the Measure-Observe-Remeasure paradigm for exploratory analysis under differential privacy. In this paradigm, the analyst queries the private database for an initial measurement, observes estimates and errors, and remeasures if they require better accuracy based on their analysis goals. We instantiate the paradigm in an interactive visualization interface, which allows analysts to observe noisy estimates (B) and remeasure (E) as needed until they reach the total remeasure (privacy loss) budget (D).
Differential privacy (DP) has the potential to enable privacy-preserving analysis on sensitive data, but requires analysts to judiciously spend a limited “privacy loss budget” ϵ across queries. Analysts conducting exploratory analyses do not, however, know all queries in advance and seldom have DP expertise. Thus, they are limited in their ability to specify ϵ allotments across queries prior to an analysis. To support analysts in spending ϵ efficiently, we propose a new interactive analysis paradigm, Measure-Observe-Remeasure, where analysts “measure” the database with a limited amount of ϵ, observe estimates and their errors, and remeasure with more ϵ as needed.
We instantiate the paradigm in an interactive visualization interface which allows analysts to spend increasing amounts of ϵ under a total budget. To observe how analysts interact with the Measure-Observe-Remeasure paradigm via the interface, we conduct a user study that compares the utility of ϵ allocations and findings from sensitive data participants make to the allocations and findings expected of a rational agent who faces the same decision task. We find that participants are able to use the workflow relatively successfully, including using budget allocation strategies that maximize over half of the available utility stemming from ϵ allocation. Their loss in performance relative to a rational agent appears to be driven more by their inability to access information and report it than to allocate ϵ.