HPC Tutorial¶
Scaling up a Tigres workflow to an HPC system like those at NERSC is very simple. This tutorial will demonstrate how to run a Tigres workflow at NERSC and assumes that you have a nersc account. It will walk you through setting up a python environment for execution and submitting Portable Batch System (PBS) script to each of the three NERSC systems: Edison, Hopper and Carver.
Python Environment¶
The following steps detail how to setup a python environment for running Tigres worklflows on NERSC resources. The instructions are for edison but should work with other similary configured NERSC resources.
The following set of instructions assume:
- You have a NERSC account
- You want to submit jobs to your default NERSC repo
- Get Latest Tigres Release
Download the Tigres 0.1.0 source distribution from the Tigres bitbucket repository. Now copy this to your NERSC home directory:
$ scp tigres-0.1.0.tar.gz dtn01.nersc.gov:./
- Get Tutorials Archive
Download the Tigres 0.1.0 tutorials from the Tigres bitbucket repository. Now copy this to your NERSC home directory:
$ scp tigres-0.1.0-tutorials.zip dtn01.nersc.gov:./
Login to NERSC and unzip the tutorials in your home directory:
$ ssh edison.nersc.gov $ unzip tigres-0.1.0-tutorials.zip $ cd tigres-0.1.0-tutorials
- Setup Script
Change to the tutorial directory directory and run
setup_env.sh
. The setup script prepares the Tigres python Environment. After running the script there will be a new virtualenv environment,env<NERSC_HOST>
, that is suffixed with the NERSC host name.:$ ./set_env.sh
- Install Tigres
There is one final step. Tigres must now be installed into the virtualenv environment that was setup in the previous step.
$ module load python $ source envedison/bin/activate (envedison)$ pip install --no-index $HOME/tigres-0.1.0.tar.gz
Basic Statistics Example¶
Once the tigres environment is setup in your NERSC home directory, you are ready to run the sample program,
basic_statistics_by_column.py
, which takes a
delimited text file and performs basic statistics (total, mean, median, variance, standard deviation) on the specified
columns. The Tigres program extracts the data and uses a parallel template for the statistical calculations.
This section will first walk you through the steps to get the sample data and submit the job to the job manager queue.
- Get the Data
This example uses a large dataset, household_power_consumption.txt, from the UC Irvine Machine Learning repository.:
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip $ unzip household_power_consumption.zip
- Submit the Job
A helper script called,
tigres_submit.sh
, has been provided in the tutorial archive. This script usestigres_run.pbs
to submit the Tigres program to the workflow queue.Run the script:
$ ./tigres_submit.sh Running on edison 1344091.edique02
The job is now submitted and should run in the debug queue:
$ qstat -ume edique02: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - --------- 1344091.edique02 me debug EnergyConsumptio -- 1 24 -- 00:30:00 Q --
Cluster Management with SGE¶
For longer workflows and/or workflows where you would like to setup a glidin/private cluster, we provide ways using the MySGE mechansim. At NERSC, MySGE can be used as the execution mechanism for Tigres workflows:
MySGE allows users to create a private Sun GridEngine cluster on large parallel systems like Hopper or Franklin. One the cluster is started, users can submit serial jobs, array jobs, and other through-put oriented workloads into the personal SGE scheduler. The jobs are then run within the user private cluster. – [1]
- Setup MySGE
Follow the instructions below to set up MySGE in your workspace. These instructions can be found at [1]:
$ ssh edison.nersc.gov $ module load mysge $ mysge_init ( use all defaults )
- Submit the Job using MySGE
Once you have set up MySGE, you may use the following command to submit
basic_statistics_by_column.py
using EXECUTION_SGE with MySGE.:$ ./tigres_submit.sh mysge Running on edison with EXECUTION_SGE 1344091.edique02
- View job status
This example uses 2 nodes for a total of 48 cores:
$qstat -ume edique01: Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - --------- 7794778.edique01 vch debug EnergyConsumptio 0 2 48 -- 00:30:00 R 00:01:04
Batch Scripts¶
PBS Script¶
The PBS script, tigres_run.pbs
, can be submitted on any NERSC system. This script
- starts/stops MySGE if needed (lines 6-10,25-27)
- loads python (line 14)
- activates the Tigres python environment (line 15)
- runs the Tigres workflow (line 23):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #!/bin/bash -l
#PBS -V
cd $PBS_O_WORKDIR
if [ "${EXECUTION}" == "EXECUTION_SGE" ]; then
module load mysge
source ~/.vpc.${NERSC_HOST}.sh
vpc_start -q ccm_queue -l mppwidth=48 -V
sleep 60
fi
module load python
source env${NERSC_HOST}/bin/activate
export TIGRES_HOSTS="`awk -vORS=, '{ print $1 }' $PBS_NODEFILE | sed 's/,$/\n/'`"
export OTIGRES_PATH=$PATH
export OTIGRES_PYTHONPATH=$PYTHONPATH
export OTIGRES_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
echo "./basic_statistics_by_column.py $EXECUTION household_power_consumption.txt ';' 2,3,4,5,6,7,8"
./basic_statistics_by_column.py $EXECUTION household_power_consumption.txt ';' 2,3,4,5,6,7,8
if [ "${EXECUTION}" == "EXECUTION_SGE" ]; then
vpc_stop
fi
|
Lines 9-12, set up the environment variables expected by EXECUTION_DISTRIBUTE_PROCESS
execute mechanism. This
is used on Carver but not Edison and Hopper.
Job Submission Script¶
The job submission script, tigres_submit.sh
- sets the number of nodes/cores (lines 3, 5-7)
- chooses the Tigres execution mechanism (line 4)
- prepares the command for running the Tigres workflow and submits the job (line 16)
- If requested, sets the execution to EXECUTION_SGE and mppwidth to 48 or 2 nodes (lines 10-13)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #!/bin/bash
export NUMCORES="mppwidth=24"
export EXECUTION="EXECUTION_LOCAL_THREAD"
if [ "$NERSC_HOST" == "carver" ]; then
export EXECUTION="EXECUTION_DISTRIBUTE_PROCESS"
export NUMCORES="nodes=3:ppn=8"
fi
if [ "$1" == "mysge" ]; then
export EXECUTION="EXECUTION_SGE"
export NUMCORES="mppwidth=48"
fi
echo "Running on ${NERSC_HOST} with ${EXECUTION}"
qsub -V -NEnergyConsumption${NERSC_HOST} -l${NUMCORES},walltime=00:30:00 tigres_run.pbs
|
[1] | (1, 2) https://www.nersc.gov/users/software/workflow-software/mysge/ |