Concepts in Tigres

Tigres provides an API for composing, executing and monitoring workflows using an abstraction called templates which captures common execution patterns. In this section of the documentation, we provide some overview information on the following topics:

Templates

The Tigres template API allows one to programmatically create workflows using templates as the building blocks. These templates are composed of individual tasks that are units of work from the end-user that needs to be executed. A Tigres workfow is a python program (e.g. my_program.py) that uses the Tigres template API to build and execute a workflow (Tigres program). There are four basic Tigres template functions: sequence, parallel, merge and split. The Tigres program can contain one or more of these template functions.

Execution behavior of the four Tigres templates

The four Tigres templates a) Sequence b) Parallel c) Split d) Merge

Core API

As mentioned previously, a Tigres program is a python program that uses the Tigres template API to build workflow. The templates are the core of the Tigres API. Any or all of the four Tigres templates can be used in a single program. A Task is the most basic unit of execution and can be defined as a python function internal to the Tigres program or a separate executable (e.g. wget, my_c_program). The figure above demonstrates the flow of execution with arrows. The inputs to each Task execution can be statically defined or retrieved from the results of previously executed Tasks or Templates.

A Glossary of Tigres Concepts

Task
The atomic unit of execution. (see tigres.Task)
Task Array
A collection of tasks. (see tigres.TaskArray)
Templates
Patterns of execution from a combination of tasks. (see Library Reference)
Input Types
The characteristic of the task the defines the type of the inputs. (see tigres.InputTypes)
Input Values
The values used in a task execution. (see tigres.InputValues)
Input Array
A collection of input values for a number of tasks. (see tigres.InputArray)

Each template function minimally takes two named collections: Task Array and Input Array. The Task Array is an ordered collection of tasks to be executed together, in sequence or parallel depending on the execution flow of the particular template. The InputArray defines the inputs for each Task in the corresponding Task Array and is a collection of Input Values.

The Task, the atomic unit of execution in Tigres, has a collection of Input Types that specifies the type of inputs a task may take. A task’s Input Values is an order list of task inputs and are passed to the task during execution. They are are not included in the task definition which allows for task reuse and late binding of data elements to the Tigres program execution. The data model in Tigres is described in greater detail in Section :ref:tigres-data-model-label

Monitoring API

The monitoring API has functions to:

  • create monitoring information
  • find and view monitoring information.

The monitoring information contains both information about template execution that is automatically generated by Tigres, and arbitrary user-provided information. All the monitoring information, both automatic and user-provided, is semi-structured, meaning it is broken into name/value pairs but only a few of the names and values are pre-defined. In general, the monitoring follows the Logging Best Practices that arose from the NetLogger project.

Execution Environments

Tigres can be executed in several different environments from batch queues to local threads and processes. By using the appropriate execution engine, a Tigres program can be executed on a single node or deployed without additional infrastructure to department clusters and batch processing queues on supercomputers. A program is written once and only the execution engine is changed at run time. This allows users to easily scale from development (desktop) to production (department clusters and HPC centers).

Tigres currently supports five execution engines.

  • Local Threads - Tigres runs tasks as threads on one machine.
  • Local Processes - Tigres runs tasks as processes on one machine.
  • Distributed Processes - Tigres distributes tasks as processes across a cluster of machines.
  • Sun Grid Engine - Tigres submits tasks as Sun Grid Engine jobs. This mode is used on HPC resources (e.g., NERSC) where a private instance of MySGE is run as a glidin.
  • SLURM - Tigres submits tasks to a SLURM job manager