HPC Tutorial

Scaling up a Tigres workflow to an HPC system like those at NERSC is very simple. This tutorial will demonstrate how to run a Tigres workflow at NERSC and assumes that you have a nersc account. It will walk you through setting up a python environment for execution and submitting Portable Batch System (PBS) script to each of the three NERSC systems: Edison, Hopper and Carver.

Python Environment

The following steps detail how to setup a python environment for running Tigres worklflows on NERSC resources. The instructions are for edison but should work with other similary configured NERSC resources.

The following set of instructions assume:

  • You have a NERSC account
  • You want to submit jobs to your default NERSC repo
  1. Get Latest Tigres Release

    Download the Tigres 0.1.0 source distribution from the Tigres bitbucket repository. Now copy this to your NERSC home directory:

    $ scp tigres-0.1.0.tar.gz dtn01.nersc.gov:./
    
  2. Get Tutorials Archive

    Download the Tigres 0.1.0 tutorials from the Tigres bitbucket repository. Now copy this to your NERSC home directory:

    $ scp tigres-0.1.0-tutorials.zip dtn01.nersc.gov:./
    

    Login to NERSC and unzip the tutorials in your home directory:

    $ ssh edison.nersc.gov
    $ unzip  tigres-0.1.0-tutorials.zip
    $ cd tigres-0.1.0-tutorials
    
  3. Setup Script

    Change to the tutorial directory directory and run setup_env.sh. The setup script prepares the Tigres python Environment. After running the script there will be a new virtualenv environment, env<NERSC_HOST>, that is suffixed with the NERSC host name.:

    $ ./set_env.sh
    
  4. Install Tigres

    There is one final step. Tigres must now be installed into the virtualenv environment that was setup in the previous step.

    $ module load python
    $ source envedison/bin/activate
    (envedison)$ pip install --no-index $HOME/tigres-0.1.0.tar.gz
    

Basic Statistics Example

Once the tigres environment is setup in your NERSC home directory, you are ready to run the sample program, basic_statistics_by_column.py, which takes a delimited text file and performs basic statistics (total, mean, median, variance, standard deviation) on the specified columns. The Tigres program extracts the data and uses a parallel template for the statistical calculations.

This section will first walk you through the steps to get the sample data and submit the job to the job manager queue.


  1. Get the Data

    This example uses a large dataset, household_power_consumption.txt, from the UC Irvine Machine Learning repository.:

    $ wget https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip
    $ unzip household_power_consumption.zip
    
  2. Submit the Job

    A helper script called, tigres_submit.sh, has been provided in the tutorial archive. This script uses tigres_run.pbs to submit the Tigres program to the workflow queue.

    Run the script:

    $ ./tigres_submit.sh
    Running on edison
    1344091.edique02
    

    The job is now submitted and should run in the debug queue:

    $ qstat -ume
    
    edique02:
                                                                                      Req'd    Req'd       Elap
    Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time
    ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
    1344091.edique02        me          debug    EnergyConsumptio    --      1     24    --   00:30:00 Q       --
    

Cluster Management with SGE

For longer workflows and/or workflows where you would like to setup a glidin/private cluster, we provide ways using the MySGE mechansim. At NERSC, MySGE can be used as the execution mechanism for Tigres workflows:

MySGE allows users to create a private Sun GridEngine cluster on large parallel systems like Hopper or Franklin. One the cluster is started, users can submit serial jobs, array jobs, and other through-put oriented workloads into the personal SGE scheduler. The jobs are then run within the user private cluster. – [1]

  1. Setup MySGE

    Follow the instructions below to set up MySGE in your workspace. These instructions can be found at [1]:

    $ ssh edison.nersc.gov
    $ module load mysge
    $ mysge_init ( use all defaults )
    
  2. Submit the Job using MySGE

    Once you have set up MySGE, you may use the following command to submit basic_statistics_by_column.py using EXECUTION_SGE with MySGE.:

    $ ./tigres_submit.sh mysge
    Running on edison with EXECUTION_SGE
    1344091.edique02
    
  3. View job status

    This example uses 2 nodes for a total of 48 cores:

    $qstat -ume
    
    edique01:
    Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time
    ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
    7794778.edique01        vch         debug    EnergyConsumptio      0     2     48    --   00:30:00 R  00:01:04
    

Batch Scripts

PBS Script

The PBS script, tigres_run.pbs, can be submitted on any NERSC system. This script

  • starts/stops MySGE if needed (lines 6-10,25-27)
  • loads python (line 14)
  • activates the Tigres python environment (line 15)
  • runs the Tigres workflow (line 23):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash -l
#PBS -V

cd $PBS_O_WORKDIR

if [ "${EXECUTION}" == "EXECUTION_SGE" ]; then
    module load mysge
    source ~/.vpc.${NERSC_HOST}.sh
    vpc_start -q ccm_queue -l mppwidth=48 -V
    sleep 60

fi

module load python
source env${NERSC_HOST}/bin/activate

export TIGRES_HOSTS="`awk -vORS=, '{ print $1 }' $PBS_NODEFILE | sed 's/,$/\n/'`"
export OTIGRES_PATH=$PATH
export OTIGRES_PYTHONPATH=$PYTHONPATH
export OTIGRES_LD_LIBRARY_PATH=$LD_LIBRARY_PATH

echo "./basic_statistics_by_column.py $EXECUTION household_power_consumption.txt ';' 2,3,4,5,6,7,8"
./basic_statistics_by_column.py $EXECUTION household_power_consumption.txt ';' 2,3,4,5,6,7,8

if [ "${EXECUTION}" == "EXECUTION_SGE" ]; then
    vpc_stop
fi

Lines 9-12, set up the environment variables expected by EXECUTION_DISTRIBUTE_PROCESS execute mechanism. This is used on Carver but not Edison and Hopper.

Job Submission Script

The job submission script, tigres_submit.sh

  • sets the number of nodes/cores (lines 3, 5-7)
  • chooses the Tigres execution mechanism (line 4)
  • prepares the command for running the Tigres workflow and submits the job (line 16)
  • If requested, sets the execution to EXECUTION_SGE and mppwidth to 48 or 2 nodes (lines 10-13)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/bin/bash

export NUMCORES="mppwidth=24"
export EXECUTION="EXECUTION_LOCAL_THREAD"
if [ "$NERSC_HOST" == "carver" ]; then
    export EXECUTION="EXECUTION_DISTRIBUTE_PROCESS"
    export NUMCORES="nodes=3:ppn=8"
fi

if [ "$1" == "mysge" ]; then
    export EXECUTION="EXECUTION_SGE"
    export NUMCORES="mppwidth=48"
fi
   
echo "Running on ${NERSC_HOST} with ${EXECUTION}"
qsub -V -NEnergyConsumption${NERSC_HOST} -l${NUMCORES},walltime=00:30:00 tigres_run.pbs
[1](1, 2) https://www.nersc.gov/users/software/workflow-software/mysge/