TORQUE hands-on |
Prepared by Vangelis Koukis, Computing Systems Laboratory, ICCS-NTUA.IntroductionThis hands-on will guide you through basic examples of using the TORQUE scheduler. You will submit, examine the environment of and destroy batch and interactive jobs.AccessTo complete this hands-on lab you need to be logged in a TORQUE submit node, where the TORQUE client utilities are installed. We will use the Grid UI, part of TUC's Grid node, for this purpose, as it has been setup as a TORQUE submit node for the local compute cluster. You have to use ssh to log into the UI. If you use Windows and don't have an ssh client you can download PuTTY which is one of the most popular, free clients.If you don't meet this requirement, please contact the trainers for assistance. Download the examplesLog into the UI and download the tutorial application files: Unpack the file in your home directory: tar xvf torque.tar and change to the directory created: cd torque The package contains the following:
Log into the UI in a second window, so you can issue two commands at once and notice the effect each one has on the other. Submit a simple interactive jobSubmit a simple interactive job, and examine its environment, especially $PBS_ENVIRONMENT and the contents of $PBS_NODEFILE:[vkoukis@ui ~]$ qsub -I -q tuc qsub: waiting for job 1081.ce01.grid.tuc.gr to start qsub: job 1081.ce01.grid.tuc.gr ready [vkoukis@wn034 ~]$ echo $PBS_ $PBS_ENVIRONMENT $PBS_JOBNAME $PBS_NODENUM $PBS_O_LANG $PBS_O_PATH $PBS_O_WORKDIR $PBS_TASKNUM $PBS_JOBCOOKIE $PBS_MOMPORT $PBS_O_HOME $PBS_O_LOGNAME $PBS_O_QUEUE $PBS_QUEUE $PBS_VERSION $PBS_JOBID $PBS_NODEFILE $PBS_O_HOST $PBS_O_MAIL $PBS_O_SHELL $PBS_SERVER $PBS_VNODENUM [vkoukis@wn034 ~]$ echo $PBS_JOBID 1081.ce01.grid.tuc.gr [vkoukis@wn034 ~]$ echo $PBS_ENVIRONMENT PBS_INTERACTIVE [vkoukis@wn034 ~]$ cat $PBS_NODEFILE wn034.grid.tuc.gr [vkoukis@wn034 ~]$ echo $PBS_NODEFILE /var/spool/pbs/aux//1081.ce01.grid.tuc.grUsing ls and cd verify that the Worker Node shares the same home directory structure with the UI. Logout of the job, by pressing Ctrl-D, or typing exit at the prompt. Repeat the submission, this time providing a proper job name with the -N argument. Also, repeat the submission requesting a larger number of nodes and notice how the contents of $PBS_NODEFILE change. Don't request more than two nodes, otherwise your job may be blocked for a very long time before it has a chance to run. In the second window, use qstat with arguments -a and -f to examine the execution queue.
Kill the currently running interactive jobIn the second window, run qdel on the currently running job. Notice what happens. How long does it take for the job to really die?
[vkoukis@ui ~]$ qdel 1081 ... qsub: job 1081.ce01.grid.tuc.gr completed [vkoukis@ui ~]$ Alter a currently running jobRun qalter on a currently running job and change its maximum wallclock time to something ridiculously small, such as 20 seconds. Notice what happens when TORQUE notices that the job can no longer continue running:[vkoukis@ui ~]$ qalter -l walltime=00:00:20 jobID_hereAlso try using qsig on a running job, with a signal such as SIGKILL: [vkoukis@ui ~]$ qsig -s SIGKILL jobID_here Submit a simple scriptMake the file simple.pbs executable, then show its contents and run it directly at the command line. Notice the PBS-specific comments and how they are ignored. Then, submit it as a batch process (no -I argument this time) using qsub. Remember to use the -k oe argument to keep both the standard output and standard error files.[vkoukis@ui ~]$ chmod +x ./simple.pbs [vkoukis@ui ~]$ cat simple.pbs [vkoukis@ui ~]$ qsub -k oe -q tuc ./simple.pbsUse cat on the standard output and error files to examine the output of the batch job. Examine different node requirement specificationsPlay with differnt node requirement specifications in qsub, preferably in an interactive job and notice how the contents of $PBS_NODEFILE change every time:[vkoukis@ui ~]$ qsub -I -q tuc -l nodes=1+2+3+4 qsub: waiting for job 1090.ce01.grid.tuc.gr to start qsub: job 1090.ce01.grid.tuc.gr ready ... cat $PBS_NODEFILE Challenge: Launch a shell script on multiple nodesStart an interactive job with more than one cores, then use pbsdsh to launch the small task.sh script on all the allocated nodes. Notice the values of $PBS_TASKNUM, $PBS_NODENUM and $PBS_VNODENUM being printed:[vkoukis@ui torque-hands-on]$ qsub -I -l nodes=4 -q tuc qsub: waiting for job 1386.ce01.grid.tuc.gr to start qsub: job 1386.ce01.grid.tuc.gr ready [vkoukis@wn034 ~]$ cd torque-hands-on/ [vkoukis@wn034 torque-hands-on]$ pbsdsh `pwd`/task.sh Thu May 7 22:54:28 EEST 2009 Task running on wn034.grid.tuc.gr: PBS_TASKNUM=6, PBS_NODENUM=0, PBS_VNODENUM=0. Thu May 7 22:54:28 EEST 2009 Bye Thu May 7 22:54:29 EEST 2009 Task running on wn032.grid.tuc.gr: PBS_TASKNUM=8, PBS_NODENUM=2, PBS_VNODENUM=2. Thu May 7 22:54:29 EEST 2009 Task running on wn031.grid.tuc.gr: PBS_TASKNUM=9, PBS_NODENUM=3, PBS_VNODENUM=3. Thu May 7 22:54:29 EEST 2009 Bye Thu May 7 22:54:29 EEST 2009 Bye Thu May 7 22:54:29 EEST 2009 Task running on wn033.grid.tuc.gr: PBS_TASKNUM=7, PBS_NODENUM=1, PBS_VNODENUM=1. Thu May 7 22:54:29 EEST 2009 Bye [vkoukis@wn034 torque-hands-on]$Notice how you must provide pbsdsh with the absolute path to the command, using `pwd` Challenge 2: Launch an array of jobsModify simple.pbs so that it prints out the value of $PBS_ARRAYID. Then submit an array of jobs, for a variety of indexes using the -t argument of qsub. Make sure you include the -k oe argument to keep the output and error files of every job array member. Notice the value printed from each job based on its array index. |