Concepts in Tigres¶
Tigres provides an API for composing, executing and monitoring workflows using an abstraction called templates which captures common execution patterns. In this section of the documentation, we provide some overview information on the following topics:
Templates¶
The Tigres template API allows one to programmatically create workflows using templates as the building
blocks. These templates are composed of individual tasks that are units of work from the end-user that needs to be executed. A Tigres workfow is a
python program (e.g. my_program.py
) that uses the Tigres template API to build and execute a workflow
(Tigres program). There are four basic Tigres template functions: sequence
, parallel
,
merge
and split
. The Tigres program can contain one or more of these
template functions.
Execution behavior of the four Tigres templates
Core API¶
As mentioned previously, a Tigres program is a python program that uses the Tigres template API to build workflow. The templates are the core of the Tigres API. Any or all of the four Tigres templates can be used in a single program. A Task
is the most basic unit of execution and can be defined as a python function internal to the Tigres program or
a separate executable (e.g. wget
, my_c_program
). The figure above demonstrates the flow of
execution with arrows. The inputs to each Task
execution can be statically defined or retrieved from
the results of previously executed Tasks or Templates.
A Glossary of Tigres Concepts
- Task
- The atomic unit of execution. (see
tigres.Task
) - Task Array
- A collection of tasks. (see
tigres.TaskArray
) - Templates
- Patterns of execution from a combination of tasks. (see Library Reference)
- Input Types
- The characteristic of the task the defines the type of the inputs. (see
tigres.InputTypes
) - Input Values
- The values used in a task execution. (see
tigres.InputValues
) - Input Array
- A collection of input values for a number of tasks. (see
tigres.InputArray
)
Each template function minimally takes two named collections: Task Array and Input Array. The Task Array is an ordered collection of tasks to be executed together, in sequence or parallel depending on the execution flow of the particular template. The InputArray defines the inputs for each Task in the corresponding Task Array and is a collection of Input Values.
The Task, the atomic unit of execution in Tigres, has a collection of Input Types that specifies the type of inputs a task may take. A task’s Input Values is an order list of task inputs and are passed to the task during execution. They are are not included in the task definition which allows for task reuse and late binding of data elements to the Tigres program execution. The data model in Tigres is described in greater detail in Section :ref:tigres-data-model-label
Monitoring API¶
The monitoring API has functions to:
- create monitoring information
- find and view monitoring information.
The monitoring information contains both information about template execution that is automatically generated by Tigres, and arbitrary user-provided information. All the monitoring information, both automatic and user-provided, is semi-structured, meaning it is broken into name/value pairs but only a few of the names and values are pre-defined. In general, the monitoring follows the Logging Best Practices that arose from the NetLogger project.
Execution Environments¶
Tigres can be executed in several different environments from batch queues to local threads and processes. By using the appropriate execution engine, a Tigres program can be executed on a single node or deployed without additional infrastructure to department clusters and batch processing queues on supercomputers. A program is written once and only the execution engine is changed at run time. This allows users to easily scale from development (desktop) to production (department clusters and HPC centers).
Tigres currently supports five execution engines.
- Local Threads - Tigres runs tasks as threads on one machine.
- Local Processes - Tigres runs tasks as processes on one machine.
- Distributed Processes - Tigres distributes tasks as processes across a cluster of machines.
- Sun Grid Engine - Tigres submits tasks as Sun Grid Engine jobs. This mode is used on HPC resources (e.g., NERSC) where a private instance of MySGE is run as a glidin.
- SLURM - Tigres submits tasks to a SLURM job manager