What is Tigres?ΒΆ

Template Interfaces for Agile Parallel Data-Intensive Science

Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. DOE User Facilities and large science collaborations are increasingly generating large enough data sets that it is no longer practical to download them to a desktop to operate on them. They are instead stored at centralized compute and storage resources such as high performance computing (HPC) centers. Analysis of this data requires an ability to run on these facilities, but with current technologies, scaling an analysis to an HPC center and to a large data set is difficult even for experts. Tigres is addressing the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable “templates” that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.

Tigres is inspired by the success of the MapReduce model. When the MapReduce model emerged from the Internet search space, it provided a radically simpler paradigm for parallel analysis by certain classes of applications. The simplicity of the API and analysis model enabled many applications to quickly script powerful scalable analyses. Similarly, Tigres provides abstractions that support a wide-array of common scientific application computational patterns. The goal of Tigres is to:

  1. Provide template abstraction to capture the core set of fundamental workflow patterns, which will allow users to compose collaborative workflow scripts in a programming language of their choice.
  2. Provide a hybrid execution mechanism for the templates that enables users to prototype their analysis workflows on desktops and seamlessly adapt them to run in production environments at scale.
  3. Provide programmatic interfaces that will allow automated and user-provided provenance tracking.
  4. Provide interfaces to capture execution state and allow users to understand complex parallel faults encountered during execution of the workflow.