.. _hpctutorial: .. currentmodule:: tigres HPC Tutorial ************ Scaling up a Tigres workflow to an HPC system like those at `NERSC `_ is very simple. This tutorial will demonstrate how to run a Tigres workflow at `NERSC `_ and assumes that you have a `nersc account `_. It will walk you through setting up a python environment for execution and submitting Simple Linux Utility for Resource Management (SLURM) script to both of the NERSC systems: `Edison `_ and `Cori `_. .. contents:: :depth: 1 :local: :backlinks: top Python Environment ================== The following steps detail how to setup a python environment for running Tigres worklflows on NERSC resources. The instructions are for edison but should work with other similary configured NERSC resources. The following set of instructions assume: * You have a NERSC account * You want to submit jobs to your default NERSC repo 1. Get Latest Tigres Release `Download `_ the Tigres |release| source distribution from the `Tigres bitbucket repository `_. Now copy this to your NERSC home directory: .. parsed-literal:: $ scp tigres-|release|.tar.gz \[user_name\]@dtn01.nersc.gov:./ 2. Get Tutorials Archive `Download `_ the Tigres |release| tutorials from the `Tigres bitbucket repository `_. Now copy this to your NERSC home directory: .. parsed-literal:: $ scp tigres-|release|-tutorials.zip \[user_name\]@dtn01.nersc.gov:./ Login to NERSC and unzip the tutorials in your home directory: .. parsed-literal:: $ ssh \[user_name\]@edison.nersc.gov $ unzip tigres-|release|-tutorials.zip $ cd tigres-|release|-tutorials 3. Setup Script Change to the tutorial directory directory and run :code:`setup_env.sh`. The setup script prepares the Tigres python Environment. After running the script there will be a new virtualenv environment, :code:`env`, that is suffixed with the NERSC host name.:: $ ./set_env.sh 4. Install Tigres There is one final step. Tigres must now be installed into the virtualenv environment that was setup in the previous step. .. parsed-literal:: $ module load python $ source env$NERSC_HOST/bin/activate (envedison)$ pip install --no-index $HOME/tigres-|release|.tar.gz Basic Statistics Example ======================== Once the tigres environment is setup in your NERSC home directory, you are ready to run the sample program, :download:`basic_statistics_by_column.py `, which takes a delimited text file and performs basic statistics (total, mean, median, variance, standard deviation) on the specified columns. The Tigres program extracts the data and uses a parallel template for the statistical calculations. This section will first walk you through the steps to get the sample data and submit the job to the job manager queue. -------- 1. Get the Data This example uses a large dataset, `household_power_consumption.txt `_, from the `UC Irvine Machine Learning repository `_.:: $ wget https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip $ unzip household_power_consumption.zip 2. Submit the Job A SLURM script called, :download:`tigres_run.slurm `, has been provided in the tutorial archive. This script is used to submit the Tigres program to the NERSC queue. Submit the job to SLURM:: $ sbatch tigres_run.slurm Submitted batch job 1031378 The job is now submitted and should run in the `debug queue `_:: $ squeue -u \[username\] JOBID USER ACCOUNT NAME PARTITION QOS NODES TIME_LIMIT TIME ST START_TIME 1031378 hconsume debug normal 2 30:00 1:12 R 2016-01-28T11:49:58 3. Check the Results Once the SLURM job is finished, you may check the results in the output file:: $ cat slurm_.out /global/u2/v/vch/tigres-0.1.1-tutorials:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/global/u2/v/vch/tigres-0.1.1-tutorials/envedison/bin:/usr/common/usg/python/ipython/3.1.0/bin:/usr/common/usg/python/matplotlib/1.4.3/bin:/usr/common/usg/python/scipy/0.15.1/bin:/usr/common/usg/python/numpy/1.9.2/bin:/usr/common/usg/python/2.7.9/bin:/global/homes/v/vch/.pyenv/shims:/global/homes/v/vch/.pyenv/bin:/usr/common/usg/altd/2.0/bin:/usr/common/usg/bin:/usr/common/mss/bin:/usr/common/nsg/bin:/opt/slurm/default/bin:/opt/cray/mpt/7.3.1/gni/bin:/opt/cray/rca/1.0.0-2.0502.57212.2.56.ari/bin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/sbin:/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1958.2.55.ari/bin:/opt/cray/xpmem/0.1-2.0502.57015.1.15.ari/bin:/opt/cray/pmi/5.0.10-1.0000.11050.0.0.ari/bin:/opt/cray/ugni/6.0-1.0502.10245.9.9.ari/bin:/opt/cray/udreg/2.3.2-1.0502.9889.2.20.ari/bin:/opt/intel/composer_xe_2015.1.133/bin/intel64:/opt/cray/craype/2.5.1/bin:/opt/cray/switch/1.0-1.0502.57058.1.58.ari/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1130.0/bin:/usr/syscom/nsg/sbin:/usr/syscom/nsg/bin:/opt/modules/3.2.10.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin TIGRES_HOSTS nid00185,nid00186 results - ('average_by_column', 'Global_intensity', 4.6277593105838) results - ('total_by_column', 'Global_intensity', 9483574.59999317) results - ('median_by_column', 'Global_intensity', 2.6) results - ('stdev_by_column', 'Global_intensity', 4.444396259786258) results - ('variance_by_column', 'Global_intensity', 19.752658114002077) results - ('average_by_column', 'Voltage', 240.83985797447758) results - ('total_by_column', 'Voltage', 493548304.1499374) results - ('median_by_column', 'Voltage', 241.01) results - ('stdev_by_column', 'Voltage', 3.23998667900864) results - ('variance_by_column', 'Voltage', 10.497513680153437) results - ('average_by_column', 'Global_reactive_power', 0.12371447630385488) results - ('total_by_column', 'Global_reactive_power', 253525.60199996372) results - ('median_by_column', 'Global_reactive_power', 0.1) results - ('stdev_by_column', 'Global_reactive_power', 0.11272197955071389) results - ('variance_by_column', 'Global_reactive_power', 0.012706244673831562) results - ('average_by_column', 'Global_active_power', 1.091615036500693) results - ('total_by_column', 'Global_active_power', 2237024.86200014) results - ('median_by_column', 'Global_active_power', 0.602) results - ('stdev_by_column', 'Global_active_power', 1.0572941610939552) results - ('variance_by_column', 'Global_active_power', 1.1178709430833706) results - ('average_by_column', 'Sub_metering_3', 6.45844735712055) results - ('total_by_column', 'Sub_metering_3', 13235167.0) results - ('median_by_column', 'Sub_metering_3', 1.0) results - ('stdev_by_column', 'Sub_metering_3', 8.437153908665618) results - ('variance_by_column', 'Sub_metering_3', 71.18556607851151) results - ('average_by_column', 'Sub_metering_2', 1.2985199679887571) results - ('total_by_column', 'Sub_metering_2', 2661031.0) results - ('median_by_column', 'Sub_metering_2', 0.0) results - ('stdev_by_column', 'Sub_metering_2', 5.822026473177329) results - ('variance_by_column', 'Sub_metering_2', 33.895992254377646) results - ('average_by_column', 'Sub_metering_1', 1.1219233096502186) results - ('total_by_column', 'Sub_metering_1', 2299135.0) results - ('median_by_column', 'Sub_metering_1', 0.0) results - ('stdev_by_column', 'Sub_metering_1', 6.153031089701269) results - ('variance_by_column', 'Sub_metering_1', 37.85979159083039) ./basic_statistics_by_column.py EXECUTION_DISTRIBUTE_PROCESS household_power_consumption.txt ';' 2,3,4,5,6,7,8 SLURM Job Script ================ The SLURM script, :download:`tigres_run.slurm `, can be submitted on any NERSC system. This script uses the Tigres distribute plugin to execute the basic statistics script on two nodes. * *Lines 3-7:* Sets up the SLURM job (i.e specifies name, number cores ..) * *Lines 9-11:* Loads the Tigres environment * *Lines 13-17:* Setups up environment variables * *Lines 30-34:* Prepares the distrbute plugin special environment variables * *Line 35:* Executes the Tigres workflow .. literalinclude:: /_static/code/tigres_run.slurm :linenos: :language: bash