Toward a Hardware/Software Co-Design Framework for Ensuring the Integrity of Exascale Scientific Data

Principal Investigator:

Sean Peisert (PI)

Senior Personnel:

Venkatesh Akella (UC Davis / LBNL Faculty Scientist)
Jason Lowe-Power (UC Davis / LBNL Faculty Scientist)

LBNL-Affiliated Graduate Students:

Ayaz Akram (UC Davis / LBNL)

Project Alumni:

Bogdan Copos (LBNL/UC Davis; Ph.D. 2017) → SRI International → Google
Prof. Hein Meling (LBNL/University of Stavanger)
Amir Teshome Wonjiga (LBNL/INRIA Rennes; Ph.D. 2019)
Reinhard Gentz (LBNL)
Anna Giannakou (LBNL)

Scientific data today is at risk due to how it is collected, stored, and analyzed in highly disparate computing systems. How can we make claims about the integrity of data as it traverses open, international networks and via instruments and systems with widely varying reliability and provenance? Numerous causes for integrity loss are possible, including bugs in existing computational pipelines, network events, user error, unintentional system effects or even intentional attack by outsiders (e.g., scientific competitors), insiders (e.g., disgruntled employees), or in the hardware/software supply chain, without any trace of the modification. Given these gaps and shortcomings in existing HPC solutions, how can we make claims about the integrity of the scientific data as it traverses those systems and networks?

We believe that in order to solve the problems described above that future HPC hardware and software solutions should be co-designed together with security and scientific computing integrity concepts designed and built into as much of the stack from the outset as possible. Given the risk of data loss due to software and hardware, this should take into account hardware elements, operating systems, compilers, application software, and the networking stack, all the way down to the way in which software developers write software and users interact with systems in a way that can affect scientific computing integrity. However, prior to laying out the research roadmap to design and construct such an architecture, we believe that several important aspects first need to be understood more clearly.

This project takes a broad look at several aspects of security and scientific integrity issues in HPC systems. Using several case studies as exemplars, and working closely with both domain scientists as well as facility staff, we propose to test and validate several initial concepts in existing scientific computing workflows at NERSC DOE HPC facility, and analyze those models better characterize integrity-related computational behavior.

Early work on this project focused on a range of activities, including identifying misuse of computing systens, leveraging blockchains for scientific computing. More recent work has focused on developing trustworthy scientific computing architectures.

For more on the current work, see Data Enclaves for Scientific Computing.

This project is supported by the US Department of Energy’s Office of Science’s Advanced Scientific Computing Research (ASCR) program.

Press regarding this project:

Berkeley Lab Cybersecurity Specialist Highlights Data Sharing Benefits, Challenges at NAS Meeting — Dec. 4, 2018

CRD’s Peisert to Discuss Data Sharing at National Academies’ COSEMPUP Meeting — Nov. 5, 2018

Lab Experts Help Coordinate ISC18, World’s First, Largest Computing Conference - June 21, 2018

Into the Medical Science DMZ (Science Node) March 23, 2018

Berkeley Lab Researchers Contribute to Making Blockchains Even More Robust — January 30, 2018

ESnet’s Science DMZ Design Could Help Transfer, Protect Medical Research Data (Science Node) — October 16, 2017

Berkeley Lab’s cybersecurity expert Sean Peisert discusses challenges & opportunities of securing HPC — Aug. 24, 2017

HPC security article in Communications of the ACM

Video accompanying HPC security article on Vimeo

Publications resulting from this project:

Ayaz Akram, Venkatesh Akella, Sean Peisert, and Jason Lowe-Power, “Enabling Design Space Exploration for RISC-V Secure Compute Environments,” Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021) June 17, 2021

Sean Peisert, “Trustworthy Scientific Computing,” Communications of the ACM (CACM), 64(5), pp. 18–21, May 2021.

Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, and Sean Peisert, “Performance Analysis of Scientific Computing Workloads on General Purpose TEEs,” Proceedings of the 35th IEEE International Parallel & Distributed Processing Sysmposium (IPDPS), May 17–21, 2021.

Ayaz Akram, “Trusted Execution for High-Performance Computing,” Proceedings of the 15th EuroSys Doctoral Workshop (EuroDW), 2021. video

Ayaz Akram, “Architectures for Secure High-Performance Computing,” Proceedings of the Young Architect Workshop (YArch) held in conjunction with the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2021. video

Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, and Sean Peisert, “Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments,” arXiv preprint arXiv:2010.13216, 25 Oct 2020.

Ross Gegan, Christina Mao, Dipak Ghosal, Matt Bishop, and Sean Peisert, “Anomaly Detection for Science DMZs Using System Performance Data,” Proceedings of the 2020 IEEE International Conference on Computing, Networking and Communications (ICNC 2020), Big Island, HI, February 17–20, 2020.

Amir Teshome Wonjiga, Louis Rilling, Christine Morin, and Sean Peisert, “Blockchain as a Trusted Component in Cloud SLA Verification,” Proceedings of the International Workshop on Cloud, IoT and Fog Security (CIFS), co-located with the 12th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), Auckland, New Zealand, December 2–5, 2019.

Amir Teshome Wonjiga, User-Centric Security Monitoring in Cloud Environments. PhD dissertation, Inria Rennes – Bretagne Atlantique, May 2019.  (Dissertation Advisors: Christine Morin and Louis Rilling)

Anna Giannakou, Daniel Gunter, and Sean Peisert, “Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks,” Proceedings of the 5th Innovate the Network for Data-Intensive Science (INDIS) Workshop, Dallas, TX, November 11, 2018.

Sean Peisert, Eli Dart, William K. Barnett, James Cuff, Robert L. Grossman, Edward Balas, Ari Berman, Anurag Shankar, and Brian Tierney, “The Medical Science DMZ: An Network Design Pattern for Data-Intensive Medical Science,” Journal of the American Medical Informatics Association (JAMIA), 25(3):267-274, March 2018.

Sean Peisert, “Security in High-Performance Computing Environments,” Communications of the ACM (CACM), 60(9):72–80, September 2017.

Bogdan Copos, Modeling Systems Using Side Channel Information. PhD dissertation, University of California, Davis, 2017.  (Dissertation Advisor: Sean Peisert)

Sean Peisert, William K. Barnett, Eli Dart, James Cuff, Robert L. Grossman, Edward Balas, Ari Berman, Anurag Shankar, and Brian Tierney, “The Medical Science DMZ,” Journal of the American Medical Informatics Association (JAMIA), 23(6), Nov. 1, 2016.

Software resulting from this project:

Blockchain Based Remote Data Integrity Checking Tool

Presentations:

Sean Peisert, “Securing Edge-to-Center Computing with Trustworthy Data Domains,” 2022 AFRL/AFOSR/DOE Energy Cost of Information Workshop, February 18, 2022.

Venkatesh Akella and Sean Peisert, “Usable Computer Security and Privacy to Enable Data Sharing for Scientific Research,” Trusted Computing Center of Excellence (TCCOE) Summit, February 1–3, 2022.

Keynote: “Usable Computer Security and Privacy to Enable Data Sharing for Scientific Research,” Second International Silicon Valley Cybersecurity Conference (SVCC), December 3, 2021.

Sean Peisert, “Advancing Cybersecurity as an Enabling Capability in High-Performance Computing Environments”, HPC User Forum, Sept. 7–9, 2021

Sean Peisert, “Cyber Privacy and Security Risks During the Pandemic” (panel - with Bart Preneel, KU Leuven; Kritika Bhardwaj, NLU Delhi; Margaret Bourdeaux, Harvard/Berkman Klein; Susan Landau, Tufts; and Smitha Prasad, NLU Delhi), Hewlett Foundation event hosted by the Fletcher School at Tufts University and the Centre for Communication Governance (CCG) at National Law University, Delhi, December 17, 2020.

Sean Peisert, “Fragility, Interdependence, and Tradeoffs — Cybersecurity and Privacy Lessons from the Pandemic,” Federal Cybersecurity R&D Interagency Working Group (CSIA IWG), NITRD, December 3, 2020.

Sean Peisert, “Scientific Computing and Sensitive Data,” DataLab Health Data Science and Systems Research and Learning Cluster, University of California, Davis, October 2, 2020.

Sean Peisert, “Privacy-Preserving Data Analysis in Scientific Computing Environments,” White House Office of Science & Technology Policy Workshop, Eisenhower Administration Building, Washington, D.C., Jan. 31, 2020.

Sean Peisert, “Privacy-Preserving Data Analysis for Energy Delivery Systems and Scientific Discovery,” Western Area Power Administration (WAPA), Golden, CO, November 5, 2019.

Ayaz Akram and Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, “Using Trusted Execution Environments on High Performance Computing Platforms,” Open-Source Enclaves Workshop (OSEW 2019), Berkeley, CA, July 25, 2019.

Sean Peisert, “Usable Computer Security and Privacy to Enable and Encourage Data Sharing for Scientific Research,”  National Academies of Sciences, Engineering, and Medicine Committee on Science, Engineering, Medicine, and Public Policy (COSEMPUP) Meeting, Washington, D.C., November 8, 2018.

Sean Peisert, “Cybersecurity Challenges and Opportunities in High-Performance Computing Environments,” International Supercomputing Conference (ISC), Frankfurt, Germany, June 26, 2018.

Sean Peisert, “Keynote: Cybersecurity for HPC Systems: State of the Art and Looking to the Future,” High-Performance Computing Security Workshop, National Institute of Standards and Technology (NIST), Gaithersburg, MD, March 28, 2018,

Sean Peisert, “Security in High Performance Computing Environments,” Computing Sciences/NERSC Security Seminar, Lawrence Berkeley National Laboratory, October 5, 2017,

Sean Peisert, “Security and Privacy in Data-Intensive, High-Performance Computing Contexts,” Berkeley Institute for Data Science (BIDS), University of California, Berkeley, October 2, 2017,

Lee Beausoleil (NSA), David Lombard (Intel), Angelos Keromytis (DARPA), Sean Peisert (LBNL), “Panel: HPC Monitoring,” NSCI: High-Performance Computing Security Workshop, National Institute of Standards and Technology (NIST), Gaithersburg, MD, September 30, 2016,

Sean Peisert, Security Expert on Why HPC Matters - Cybersecurity for HPC Systems: Challenges and Opportunities, NSCI: High-Performance Computing Security Workshop, National Institute of Standards and Technology (NIST), Gaithersburg, MD, September 29, 2016,

Other Resources:

Ayaz Akram, Setting up Trusted HPC System in the Cloud, November 19, 2020.

More information is available on other Berkeley Lab R&D projects focusing on cybersecurity in general, as well as specifically on cybersecurity for scientific and high-performance computing.