HPC Tutorial

Scaling up a Tigres workflow to an HPC system like those at NERSC is very simple. This tutorial will demonstrate how to run a Tigres workflow at NERSC and assumes that you have a nersc account. It will walk you through setting up a python environment for execution and submitting Simple Linux Utility for Resource Management (SLURM) script to both of the NERSC systems: Edison and Cori.

Python Environment

The following steps detail how to setup a python environment for running Tigres worklflows on NERSC resources. The instructions are for edison but should work with other similary configured NERSC resources.

The following set of instructions assume:

  • You have a NERSC account
  • You want to submit jobs to your default NERSC repo
  1. Get Latest Tigres Release

    Download the Tigres 0.1.1 source distribution from the Tigres bitbucket repository. Now copy this to your NERSC home directory:

    $ scp tigres-0.1.1.tar.gz [user_name]@dtn01.nersc.gov:./
    
  2. Get Tutorials Archive

    Download the Tigres 0.1.1 tutorials from the Tigres bitbucket repository. Now copy this to your NERSC home directory:

    $ scp tigres-0.1.1-tutorials.zip [user_name]@dtn01.nersc.gov:./
    

    Login to NERSC and unzip the tutorials in your home directory:

    $ ssh [user_name]@edison.nersc.gov
    $ unzip  tigres-0.1.1-tutorials.zip
    $ cd tigres-0.1.1-tutorials
    
  3. Setup Script

    Change to the tutorial directory directory and run setup_env.sh. The setup script prepares the Tigres python Environment. After running the script there will be a new virtualenv environment, env<NERSC_HOST>, that is suffixed with the NERSC host name.:

    $ ./set_env.sh
    
  4. Install Tigres

    There is one final step. Tigres must now be installed into the virtualenv environment that was setup in the previous step.

    $ module load python
    $ source env$NERSC_HOST/bin/activate
    (envedison)$ pip install --no-index $HOME/tigres-0.1.1.tar.gz
    

Basic Statistics Example

Once the tigres environment is setup in your NERSC home directory, you are ready to run the sample program, basic_statistics_by_column.py, which takes a delimited text file and performs basic statistics (total, mean, median, variance, standard deviation) on the specified columns. The Tigres program extracts the data and uses a parallel template for the statistical calculations.

This section will first walk you through the steps to get the sample data and submit the job to the job manager queue.


  1. Get the Data

    This example uses a large dataset, household_power_consumption.txt, from the UC Irvine Machine Learning repository.:

    $ wget https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip
    $ unzip household_power_consumption.zip
    
  2. Submit the Job

    A SLURM script called, tigres_run.slurm, has been provided in the tutorial archive. This script is used to submit the Tigres program to the NERSC queue.

    Submit the job to SLURM:

    $ sbatch tigres_run.slurm
    Submitted batch job 1031378
    

    The job is now submitted and should run in the debug queue:

    $ squeue -u \[username\]
         JOBID       USER  ACCOUNT       NAME  PARTITION    QOS NODES   TIME_LIMIT       TIME   ST           START_TIME
       1031378        <username>   <repo>   hconsume      debug normal     2        30:00       1:12    R  2016-01-28T11:49:58
    
  3. Check the Results Once the SLURM job is finished, you may check the results in the output file:

    $ cat slurm_<job>.out
    /global/u2/v/vch/tigres-0.1.1-tutorials:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/usr/common/usg/python/ipython/3.1.0/bin:/usr/common/usg/python/matplotlib/1.4.3/bin:/usr/common/usg/python/scipy/0.15.1/bin:/usr/common/usg/python/numpy/1.9.2/bin:/usr/common/usg/python/2.7.9/bin:/global/homes/v/vch/.pyenv/shims:/global/homes/v/vch/.pyenv/bin:/usr/common/usg/altd/2.0/bin:/usr/common/usg/bin:/usr/common/mss/bin:/usr/common/nsg/bin:/opt/slurm/default/bin:/opt/cray/mpt/7.3.1/gni/bin:/opt/cray/rca/1.0.0-2.0502.57212.2.56.ari/bin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/sbin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1958.2.55.ari/bin:/opt/cray/xpmem/0.1-2.0502.57015.1.15.ari/bin:/opt/cray/pmi/5.0.10-1.0000.11050.0.0.ari/bin:/opt/cray/ugni/6.0-1.0502.10245.9.9.ari/bin:/opt/cray/udreg/2.3.2-1.0502.9889.2.20.ari/bin:/opt/intel/composer_xe_2015.1.133/bin/intel64:/opt/cray/craype/2.5.1/bin:/opt/cray/switch/1.0-1.0502.57058.1.58.ari/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1130.0/bin:/usr/syscom/nsg/sbin:/usr/syscom/nsg/bin:/opt/modules/3.2.10.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin
    TIGRES_HOSTS nid00185,nid00186
    results - ('average_by_column', 'Global_intensity', 4.6277593105838)
    results - ('total_by_column', 'Global_intensity', 9483574.59999317)
    results - ('median_by_column', 'Global_intensity', 2.6)
    results - ('stdev_by_column', 'Global_intensity', 4.444396259786258)
    results - ('variance_by_column', 'Global_intensity', 19.752658114002077)
    results - ('average_by_column', 'Voltage', 240.83985797447758)
    results - ('total_by_column', 'Voltage', 493548304.1499374)
    results - ('median_by_column', 'Voltage', 241.01)
    results - ('stdev_by_column', 'Voltage', 3.23998667900864)
    results - ('variance_by_column', 'Voltage', 10.497513680153437)
    results - ('average_by_column', 'Global_reactive_power', 0.12371447630385488)
    results - ('total_by_column', 'Global_reactive_power', 253525.60199996372)
    results - ('median_by_column', 'Global_reactive_power', 0.1)
    results - ('stdev_by_column', 'Global_reactive_power', 0.11272197955071389)
    results - ('variance_by_column', 'Global_reactive_power', 0.012706244673831562)
    results - ('average_by_column', 'Global_active_power', 1.091615036500693)
    results - ('total_by_column', 'Global_active_power', 2237024.86200014)
    results - ('median_by_column', 'Global_active_power', 0.602)
    results - ('stdev_by_column', 'Global_active_power', 1.0572941610939552)
    results - ('variance_by_column', 'Global_active_power', 1.1178709430833706)
    results - ('average_by_column', 'Sub_metering_3', 6.45844735712055)
    results - ('total_by_column', 'Sub_metering_3', 13235167.0)
    results - ('median_by_column', 'Sub_metering_3', 1.0)
    results - ('stdev_by_column', 'Sub_metering_3', 8.437153908665618)
    results - ('variance_by_column', 'Sub_metering_3', 71.18556607851151)
    results - ('average_by_column', 'Sub_metering_2', 1.2985199679887571)
    results - ('total_by_column', 'Sub_metering_2', 2661031.0)
    results - ('median_by_column', 'Sub_metering_2', 0.0)
    results - ('stdev_by_column', 'Sub_metering_2', 5.822026473177329)
    results - ('variance_by_column', 'Sub_metering_2', 33.895992254377646)
    results - ('average_by_column', 'Sub_metering_1', 1.1219233096502186)
    results - ('total_by_column', 'Sub_metering_1', 2299135.0)
    results - ('median_by_column', 'Sub_metering_1', 0.0)
    results - ('stdev_by_column', 'Sub_metering_1', 6.153031089701269)
    results - ('variance_by_column', 'Sub_metering_1', 37.85979159083039)
    ./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8
    

SLURM Job Script

The SLURM script, tigres_run.slurm, can be submitted on any NERSC system. This script uses the Tigres distribute plugin to execute the basic statistics script on two nodes.

  • Lines 3-7: Sets up the SLURM job (i.e specifies name, number cores ..)
  • Lines 9-11: Loads the Tigres environment
  • Lines 13-17: Setups up environment variables
  • Lines 30-34: Prepares the distrbute plugin special environment variables
  • Line 35: Executes the Tigres workflow
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/bin/sh

#SBATCH -p debug
#SBATCH --ccm
#SBATCH -N 2
#SBATCH -t 00:30:00
#SBATCH -J hconsume

# Load the tigres python environment
module load python
source env$NERSC_HOST/bin/activate

# Add the application code to the paths
export PYTHONPATH=$SLURM_SUBMIT_DIR:$PYTHONPATH
export PATH=$SLURM_SUBMIT_DIR:$PATH
echo $PATH
cd $SLURM_SUBMIT_DIR

# Determine the hosts available. Convert the compact host list to a 
# comma separated list.
export TIGRES_HOSTS=`scontrol show hostname $SLURM_JOB_NODELIST  | awk -vORS=, '{ print $1 }' | sed s'/.$//'`
echo "TIGRES_HOSTS ${TIGRES_HOSTS}"


# The workflow is executed with the Tigres Distribute plugin.  
# Since the tasks will be executed across nodes, we need to 
# define the environment with OTIGRES_* environment
# variables.  Also, TIGRES_HOSTS will list all 
# the hosts used for this workflow
export OTIGRES_PATH=$PATH
export OTIGRES_PYTHONPATH=$SLURM_SUBMIT_DIR/env$NERSC_HOST/lib/python2.7/site-packages:$PYTHONPATH
export OTIGRES_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export EXECUTION=EXECUTION_DISTRIBUTE_PROCESS
./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8
echo "./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8"