Privacy-Preserving Data Analysis for Scientific Discovery

Project Summary

Data is frequently not shared by organizations because that data is considered by the organization to be in some way sensitive. For example, there may be laws or regulations prohibiting sharing due to personal privacy or national security issues, or the organization owning the data may also consider that data to be a proprietary trade secret. In any case, that data cannot or will not be released in raw form, and so alternative approaches are needed if that data is to be shared at all.

Today, data is often not shared at all, or if it is shared, it is done so in ways that require people processing or analyzing that data to access the data in highly secured, non-networked environments set up to prevent any data from being exfiltrated either physically from a building or certainly from a network. This is the reason why much research is hindered. Sometimes data is shared through processes of “anonymization” in which data is typically either masked or made more general. Unfortunately, these techniques have repeatedly been shown to fail, typically by merging external information containing identifiable information with quasi-identifiers contained in the dataset in order to identify “anonymized” records in the dataset.

This project aims to develop a method of leveraging a variety of hardware and software apparoaches, in concert with privacy-preserving technologies, such as differential privacy, for the scientific analysis of sensitive data, in order to provide significantly greater confidence to the owner of a set of sensitive data that the data will not be exposed or altered, and also reduce the liability exposure of the data center to assertions of security negligence or insider attacks by providing an environment in which even they cannot access the raw data, all without significant negative impacts to usability or performance. The environment that we envision that is is both secure and usable, and also has protections against “insiders” such as system administrators leverages techniques that are relatively new, and just becoming practically useful for these purposes.

This project is supported by Berkeley Lab Contractor Supported Research funding.

Principal Investigator:

Sean Peisert (PI; LBNL)

Publications resulting from this project:

none yet

More information is available on other Berkeley Lab R&D projects focusing on cybersecurity in general, as well as specifically on cybersecurity for scientific research.