Scaling up a Tigres workflow to an HPC system like those at NERSC is very simple. This tutorial will demonstrate how to run a Tigres workflow at NERSC and assumes that you have a nersc account. It will walk you through setting up a python environment for execution and submitting Simple Linux Utility for Resource Management (SLURM) script to both of the NERSC systems: Edison and Cori.
The following steps detail how to setup a python environment for running Tigres worklflows on NERSC resources. The instructions are for edison but should work with other similary configured NERSC resources.
The following set of instructions assume:
Download Tigres 0.2.0 source distribution here. Now copy this to your NERSC home directory:
$ scp tigres-0.2.0.tar.gz [user_name]@dtn01.nersc.gov:./
Download Tigres 0.2.0 tutorials here. Now copy this to your NERSC home directory:
$ scp tigres-0.2.0-tutorials.zip [user_name]@dtn01.nersc.gov:./
Login to NERSC and unzip the tutorials in your home directory:
$ ssh [user_name]@edison.nersc.gov $ unzip tigres-0.2.0-tutorials.zip $ cd tigres-0.2.0-tutorials
Change to the tutorial directory directory and run setup_env.sh. The setup script prepares the Tigres python Environment. After running the script there will be a new virtualenv environment, env<NERSC_HOST>, that is suffixed with the NERSC host name.:
$ ./set_env.sh
There is one final step. Tigres must now be installed into the virtualenv environment that was setup in the previous step.
$ module load python $ source env$NERSC_HOST/bin/activate (envedison)$ pip install --no-index $HOME/tigres-0.2.0.tar.gz
Once the tigres environment is setup in your NERSC home directory, you are ready to run the sample program, basic_statistics_by_column.py, which takes a delimited text file and performs basic statistics (total, mean, median, variance, standard deviation) on the specified columns. The Tigres program extracts the data and uses a parallel template for the statistical calculations.
This section will first walk you through the steps to get the sample data and submit the job to the job manager queue.
This example uses a large dataset, household_power_consumption.txt, from the UC Irvine Machine Learning repository.:
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip
$ unzip household_power_consumption.zip
A SLURM script called, tigres_run.slurm, has been provided in the tutorial archive. This script is used to submit the Tigres program to the NERSC queue.
Submit the job to SLURM:
$ sbatch tigres_run.slurm
Submitted batch job 1031378
The job is now submitted and should run in the debug queue:
$ squeue -u \[username\]
JOBID USER ACCOUNT NAME PARTITION QOS NODES TIME_LIMIT TIME ST START_TIME
1031378 <username> <repo> hconsume debug normal 2 30:00 1:12 R 2016-01-28T11:49:58
Check the Results Once the SLURM job is finished, you may check the results in the output file:
$ cat slurm_<job>.out
/global/u2/v/vch/tigres-0.1.1-tutorials:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/usr/common/usg/python/ipython/3.1.0/bin:/usr/common/usg/python/matplotlib/1.4.3/bin:/usr/common/usg/python/scipy/0.15.1/bin:/usr/common/usg/python/numpy/1.9.2/bin:/usr/common/usg/python/2.7.9/bin:/global/homes/v/vch/.pyenv/shims:/global/homes/v/vch/.pyenv/bin:/usr/common/usg/altd/2.0/bin:/usr/common/usg/bin:/usr/common/mss/bin:/usr/common/nsg/bin:/opt/slurm/default/bin:/opt/cray/mpt/7.3.1/gni/bin:/opt/cray/rca/1.0.0-2.0502.57212.2.56.ari/bin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/sbin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1958.2.55.ari/bin:/opt/cray/xpmem/0.1-2.0502.57015.1.15.ari/bin:/opt/cray/pmi/5.0.10-1.0000.11050.0.0.ari/bin:/opt/cray/ugni/6.0-1.0502.10245.9.9.ari/bin:/opt/cray/udreg/2.3.2-1.0502.9889.2.20.ari/bin:/opt/intel/composer_xe_2015.1.133/bin/intel64:/opt/cray/craype/2.5.1/bin:/opt/cray/switch/1.0-1.0502.57058.1.58.ari/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1130.0/bin:/usr/syscom/nsg/sbin:/usr/syscom/nsg/bin:/opt/modules/3.2.10.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin
TIGRES_HOSTS nid00185,nid00186
results - ('average_by_column', 'Global_intensity', 4.6277593105838)
results - ('total_by_column', 'Global_intensity', 9483574.59999317)
results - ('median_by_column', 'Global_intensity', 2.6)
results - ('stdev_by_column', 'Global_intensity', 4.444396259786258)
results - ('variance_by_column', 'Global_intensity', 19.752658114002077)
results - ('average_by_column', 'Voltage', 240.83985797447758)
results - ('total_by_column', 'Voltage', 493548304.1499374)
results - ('median_by_column', 'Voltage', 241.01)
results - ('stdev_by_column', 'Voltage', 3.23998667900864)
results - ('variance_by_column', 'Voltage', 10.497513680153437)
results - ('average_by_column', 'Global_reactive_power', 0.12371447630385488)
results - ('total_by_column', 'Global_reactive_power', 253525.60199996372)
results - ('median_by_column', 'Global_reactive_power', 0.1)
results - ('stdev_by_column', 'Global_reactive_power', 0.11272197955071389)
results - ('variance_by_column', 'Global_reactive_power', 0.012706244673831562)
results - ('average_by_column', 'Global_active_power', 1.091615036500693)
results - ('total_by_column', 'Global_active_power', 2237024.86200014)
results - ('median_by_column', 'Global_active_power', 0.602)
results - ('stdev_by_column', 'Global_active_power', 1.0572941610939552)
results - ('variance_by_column', 'Global_active_power', 1.1178709430833706)
results - ('average_by_column', 'Sub_metering_3', 6.45844735712055)
results - ('total_by_column', 'Sub_metering_3', 13235167.0)
results - ('median_by_column', 'Sub_metering_3', 1.0)
results - ('stdev_by_column', 'Sub_metering_3', 8.437153908665618)
results - ('variance_by_column', 'Sub_metering_3', 71.18556607851151)
results - ('average_by_column', 'Sub_metering_2', 1.2985199679887571)
results - ('total_by_column', 'Sub_metering_2', 2661031.0)
results - ('median_by_column', 'Sub_metering_2', 0.0)
results - ('stdev_by_column', 'Sub_metering_2', 5.822026473177329)
results - ('variance_by_column', 'Sub_metering_2', 33.895992254377646)
results - ('average_by_column', 'Sub_metering_1', 1.1219233096502186)
results - ('total_by_column', 'Sub_metering_1', 2299135.0)
results - ('median_by_column', 'Sub_metering_1', 0.0)
results - ('stdev_by_column', 'Sub_metering_1', 6.153031089701269)
results - ('variance_by_column', 'Sub_metering_1', 37.85979159083039)
./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8
The SLURM script, tigres_run.slurm, can be submitted on any NERSC system. This script uses the Tigres distribute plugin to execute the basic statistics script on two nodes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!/bin/sh
#SBATCH -p debug
#SBATCH --ccm
#SBATCH -N 2
#SBATCH -t 00:30:00
#SBATCH -J hconsume
# Load the tigres python environment
module load python
source env$NERSC_HOST/bin/activate
# Add the application code to the paths
export PYTHONPATH=$SLURM_SUBMIT_DIR:$PYTHONPATH
export PATH=$SLURM_SUBMIT_DIR:$PATH
echo $PATH
cd $SLURM_SUBMIT_DIR
# Determine the hosts available. Convert the compact host list to a
# comma separated list.
export TIGRES_HOSTS=`scontrol show hostname $SLURM_JOB_NODELIST | awk -vORS=, '{ print $1 }' | sed s'/.$//'`
echo "TIGRES_HOSTS ${TIGRES_HOSTS}"
# The workflow is executed with the Tigres Distribute plugin.
# Since the tasks will be executed across nodes, we need to
# define the environment with OTIGRES_* environment
# variables. Also, TIGRES_HOSTS will list all
# the hosts used for this workflow
export OTIGRES_PATH=$PATH
export OTIGRES_PYTHONPATH=$SLURM_SUBMIT_DIR/env$NERSC_HOST/lib/python2.7/site-packages:$PYTHONPATH
export OTIGRES_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export EXECUTION=EXECUTION_DISTRIBUTE_PROCESS
./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8
echo "./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8"
|