HPC Tutorial¶
Scaling up a Tigres workflow to an HPC system like those at NERSC is very simple. This tutorial will demonstrate how to run a Tigres workflow at NERSC and assumes that you have a nersc account. It will walk you through setting up a python environment for execution and submitting Simple Linux Utility for Resource Management (SLURM) script to both of the NERSC systems: Edison and Cori.
Python Environment¶
The following steps detail how to setup a python environment for running Tigres worklflows on NERSC resources. The instructions are for edison but should work with other similary configured NERSC resources.
The following set of instructions assume:
- You have a NERSC account
- You want to submit jobs to your default NERSC repo
- Get Latest Tigres Release
Download the Tigres 0.1.1 source distribution from the Tigres bitbucket repository. Now copy this to your NERSC home directory:
$ scp tigres-0.1.1.tar.gz [user_name]@dtn01.nersc.gov:./
- Get Tutorials Archive
Download the Tigres 0.1.1 tutorials from the Tigres bitbucket repository. Now copy this to your NERSC home directory:
$ scp tigres-0.1.1-tutorials.zip [user_name]@dtn01.nersc.gov:./
Login to NERSC and unzip the tutorials in your home directory:
$ ssh [user_name]@edison.nersc.gov $ unzip tigres-0.1.1-tutorials.zip $ cd tigres-0.1.1-tutorials
- Setup Script
Change to the tutorial directory directory and run
setup_env.sh
. The setup script prepares the Tigres python Environment. After running the script there will be a new virtualenv environment,env<NERSC_HOST>
, that is suffixed with the NERSC host name.:$ ./set_env.sh
- Install Tigres
There is one final step. Tigres must now be installed into the virtualenv environment that was setup in the previous step.
$ module load python $ source env$NERSC_HOST/bin/activate (envedison)$ pip install --no-index $HOME/tigres-0.1.1.tar.gz
Basic Statistics Example¶
Once the tigres environment is setup in your NERSC home directory, you are ready to run the sample program,
basic_statistics_by_column.py
, which takes a
delimited text file and performs basic statistics (total, mean, median, variance, standard deviation) on the specified
columns. The Tigres program extracts the data and uses a parallel template for the statistical calculations.
This section will first walk you through the steps to get the sample data and submit the job to the job manager queue.
- Get the Data
This example uses a large dataset, household_power_consumption.txt, from the UC Irvine Machine Learning repository.:
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip $ unzip household_power_consumption.zip
- Submit the Job
A SLURM script called,
tigres_run.slurm
, has been provided in the tutorial archive. This script is used to submit the Tigres program to the NERSC queue.Submit the job to SLURM:
$ sbatch tigres_run.slurm Submitted batch job 1031378
The job is now submitted and should run in the debug queue:
$ squeue -u \[username\] JOBID USER ACCOUNT NAME PARTITION QOS NODES TIME_LIMIT TIME ST START_TIME 1031378 <username> <repo> hconsume debug normal 2 30:00 1:12 R 2016-01-28T11:49:58
Check the Results Once the SLURM job is finished, you may check the results in the output file:
$ cat slurm_<job>.out /global/u2/v/vch/tigres-0.1.1-tutorials:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/usr/common/usg/python/ipython/3.1.0/bin:/usr/common/usg/python/matplotlib/1.4.3/bin:/usr/common/usg/python/scipy/0.15.1/bin:/usr/common/usg/python/numpy/1.9.2/bin:/usr/common/usg/python/2.7.9/bin:/global/homes/v/vch/.pyenv/shims:/global/homes/v/vch/.pyenv/bin:/usr/common/usg/altd/2.0/bin:/usr/common/usg/bin:/usr/common/mss/bin:/usr/common/nsg/bin:/opt/slurm/default/bin:/opt/cray/mpt/7.3.1/gni/bin:/opt/cray/rca/1.0.0-2.0502.57212.2.56.ari/bin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/sbin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1958.2.55.ari/bin:/opt/cray/xpmem/0.1-2.0502.57015.1.15.ari/bin:/opt/cray/pmi/5.0.10-1.0000.11050.0.0.ari/bin:/opt/cray/ugni/6.0-1.0502.10245.9.9.ari/bin:/opt/cray/udreg/2.3.2-1.0502.9889.2.20.ari/bin:/opt/intel/composer_xe_2015.1.133/bin/intel64:/opt/cray/craype/2.5.1/bin:/opt/cray/switch/1.0-1.0502.57058.1.58.ari/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1130.0/bin:/usr/syscom/nsg/sbin:/usr/syscom/nsg/bin:/opt/modules/3.2.10.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin TIGRES_HOSTS nid00185,nid00186 results - ('average_by_column', 'Global_intensity', 4.6277593105838) results - ('total_by_column', 'Global_intensity', 9483574.59999317) results - ('median_by_column', 'Global_intensity', 2.6) results - ('stdev_by_column', 'Global_intensity', 4.444396259786258) results - ('variance_by_column', 'Global_intensity', 19.752658114002077) results - ('average_by_column', 'Voltage', 240.83985797447758) results - ('total_by_column', 'Voltage', 493548304.1499374) results - ('median_by_column', 'Voltage', 241.01) results - ('stdev_by_column', 'Voltage', 3.23998667900864) results - ('variance_by_column', 'Voltage', 10.497513680153437) results - ('average_by_column', 'Global_reactive_power', 0.12371447630385488) results - ('total_by_column', 'Global_reactive_power', 253525.60199996372) results - ('median_by_column', 'Global_reactive_power', 0.1) results - ('stdev_by_column', 'Global_reactive_power', 0.11272197955071389) results - ('variance_by_column', 'Global_reactive_power', 0.012706244673831562) results - ('average_by_column', 'Global_active_power', 1.091615036500693) results - ('total_by_column', 'Global_active_power', 2237024.86200014) results - ('median_by_column', 'Global_active_power', 0.602) results - ('stdev_by_column', 'Global_active_power', 1.0572941610939552) results - ('variance_by_column', 'Global_active_power', 1.1178709430833706) results - ('average_by_column', 'Sub_metering_3', 6.45844735712055) results - ('total_by_column', 'Sub_metering_3', 13235167.0) results - ('median_by_column', 'Sub_metering_3', 1.0) results - ('stdev_by_column', 'Sub_metering_3', 8.437153908665618) results - ('variance_by_column', 'Sub_metering_3', 71.18556607851151) results - ('average_by_column', 'Sub_metering_2', 1.2985199679887571) results - ('total_by_column', 'Sub_metering_2', 2661031.0) results - ('median_by_column', 'Sub_metering_2', 0.0) results - ('stdev_by_column', 'Sub_metering_2', 5.822026473177329) results - ('variance_by_column', 'Sub_metering_2', 33.895992254377646) results - ('average_by_column', 'Sub_metering_1', 1.1219233096502186) results - ('total_by_column', 'Sub_metering_1', 2299135.0) results - ('median_by_column', 'Sub_metering_1', 0.0) results - ('stdev_by_column', 'Sub_metering_1', 6.153031089701269) results - ('variance_by_column', 'Sub_metering_1', 37.85979159083039) ./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8
SLURM Job Script¶
The SLURM script, tigres_run.slurm
, can be submitted on any NERSC system. This script
uses the Tigres distribute plugin to execute the basic statistics script on two nodes.
- Lines 3-7: Sets up the SLURM job (i.e specifies name, number cores ..)
- Lines 9-11: Loads the Tigres environment
- Lines 13-17: Setups up environment variables
- Lines 30-34: Prepares the distrbute plugin special environment variables
- Line 35: Executes the Tigres workflow
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!/bin/sh
#SBATCH -p debug
#SBATCH --ccm
#SBATCH -N 2
#SBATCH -t 00:30:00
#SBATCH -J hconsume
# Load the tigres python environment
module load python
source env$NERSC_HOST/bin/activate
# Add the application code to the paths
export PYTHONPATH=$SLURM_SUBMIT_DIR:$PYTHONPATH
export PATH=$SLURM_SUBMIT_DIR:$PATH
echo $PATH
cd $SLURM_SUBMIT_DIR
# Determine the hosts available. Convert the compact host list to a
# comma separated list.
export TIGRES_HOSTS=`scontrol show hostname $SLURM_JOB_NODELIST | awk -vORS=, '{ print $1 }' | sed s'/.$//'`
echo "TIGRES_HOSTS ${TIGRES_HOSTS}"
# The workflow is executed with the Tigres Distribute plugin.
# Since the tasks will be executed across nodes, we need to
# define the environment with OTIGRES_* environment
# variables. Also, TIGRES_HOSTS will list all
# the hosts used for this workflow
export OTIGRES_PATH=$PATH
export OTIGRES_PYTHONPATH=$SLURM_SUBMIT_DIR/env$NERSC_HOST/lib/python2.7/site-packages:$PYTHONPATH
export OTIGRES_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export EXECUTION=EXECUTION_DISTRIBUTE_PROCESS
./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8
echo "./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8"
|