.. _tigresoverview: Concepts in Tigres ****************** Tigres provides an API for composing, executing and monitoring workflows using an abstraction called *templates* which captures common execution patterns. In this section of the documentation, we provide some overview information on the following topics: .. contents:: :depth: 1 :local: :backlinks: top Templates ========= The Tigres template API allows one to programmatically create `workflows` using `templates` as the building blocks. These `templates` are composed of individual `tasks` that are units of work from the end-user that needs to be executed. A Tigres `workfow` is a python program (e.g. :code:`my_program.py`) that uses the Tigres template API to build and execute a workflow `(Tigres program)`. There are four basic Tigres template functions: :code:`sequence`, :code:`parallel`, :code:`merge` and :code:`split`. The Tigres program can contain one or more of these template functions. .. centered:: :strong:`Execution behavior of the four Tigres templates` .. image:: _static/images/templates.png :alt: The four Tigres templates a) Sequence b) Parallel c) Split d) Merge :align: center :width: 100% Core API ======== As mentioned previously, a Tigres program is a python program that uses the Tigres template API to build workflow. The templates are the core of the Tigres API. Any or all of the four Tigres templates can be used in a single program. A :code:`Task` is the most basic unit of execution and can be defined as a python function internal to the Tigres program or a separate executable (e.g. :code:`wget`, :code:`my_c_program`). The figure above demonstrates the flow of execution with arrows. The inputs to each :code:`Task` execution can be statically defined or retrieved from the results of previously executed *Tasks* or *Templates*. .. centered:: :strong:`A Glossary of Tigres Concepts` .. glossary:: Task The atomic unit of execution. (see :class:`tigres.Task`) Task Array A collection of tasks. (see :class:`tigres.TaskArray`) Templates Patterns of execution from a combination of tasks. (see :ref:`apireference`) Input Types The characteristic of the task the defines the type of the inputs. (see :class:`tigres.InputTypes`) Input Values The values used in a task execution. (see :class:`tigres.InputValues`) Input Array A collection of input values for a number of tasks. (see :class:`tigres.InputArray`) Each *template* function minimally takes two named collections: *Task Array* and *Input Array*. The *Task Array* is an ordered collection of tasks to be executed together, in sequence or parallel depending on the execution flow of the particular template. The *InputArray* defines the inputs for each *Task* in the corresponding *Task Array* and is a collection of *Input Values*. The *Task*, the atomic unit of execution in Tigres, has a collection of *Input Types* that specifies the type of inputs a task may take. A task's *Input Values* is an order list of task inputs and are passed to the task during execution. They are are not included in the task definition which allows for task reuse and late binding of data elements to the Tigres program execution. The data model in Tigres is described in greater detail in Section :ref:tigres-data-model-label Monitoring API ============== The monitoring API has functions to: * create monitoring information * find and view monitoring information. The monitoring information contains both information about template execution that is automatically generated by Tigres, and arbitrary user-provided information. All the monitoring information, both automatic and user-provided, is *semi-structured*, meaning it is broken into name/value pairs but only a few of the names and values are pre-defined. In general, the monitoring follows the `Logging Best Practices`_ that arose from the NetLogger_ project. .. _`Logging Best Practices`: https://docs.google.com/a/lbl.gov/document/d/1oeW_l_YgQbR-C_7R2cKl6eYBT5N4WSMbvz0AT6hYDvA/edit .. _`NetLogger`: http://netlogger.lbl.gov/ Execution Environments ====================== Tigres can be executed in several different environments from batch queues to local threads and processes. By using the appropriate `execution engine`, a Tigres program can be executed on a single node or deployed without additional infrastructure to department clusters and batch processing queues on supercomputers. A program is written once and only the execution engine is changed at run time. This allows users to easily scale from development (desktop) to production (department clusters and HPC centers). Tigres currently supports five execution engines. * Local Threads - Tigres runs tasks as threads on one machine. * Local Processes - Tigres runs tasks as processes on one machine. * Distributed Processes - Tigres distributes tasks as processes across a cluster of machines. * Sun Grid Engine - Tigres submits tasks as Sun Grid Engine jobs. This mode is used on HPC resources (e.g., NERSC) where a private instance of MySGE is run as a glidin. * SLURM - Tigres submits tasks to a SLURM job manager