This seed project looked at defining means for understanding what data can be sanitized, and how. Traditional techniques often either make data unusable for research or operational purposes or fail to completely sanitize the data. Thus, our data sanitization work built on past techniques by also using an “open world” assumption. We also asked, what are the relationships between data fields that would need to be made in order to reveal certain information, what associations need to be protected in order to conceal certain information, and, finally, given policy constraints by the different stakeholders, can a dataset be sanitized in a way that satisfies the policies of all of those people, or would certain compromises to one or more policies need to be made? At LBNL, this project was led by Sean Peisert and was funded by the Institute for Information Infrastructure Protection (I3P).
More information is available at the UC Davis data sanitization project web site.