Advertisement
Guest User

Onyx PBS cached

a guest
Nov 5th, 2017
408
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 33.86 KB | None | 0 0
  1. 1. Introduction
  2. 2. Anatomy of a Batch Script
  3. 2.1. Specify Your Shell
  4. 2.2. Required PBS Directives
  5. 2.2.1. Number of Nodes and Processes per Node
  6. 2.2.2. How Long to Run
  7. 2.2.3. Which Queue to Run In
  8. 2.2.4. Your Project ID
  9. 2.3. The Execution Block
  10. 3. Submitting Your Job
  11. 4. Simple Batch Script Example
  12. 5. Job Management Commands
  13. 6. Optional PBS Directives
  14. 6.1. Job Identification Directives
  15. 6.1.1. Application Name
  16. 6.1.2. Job Name
  17. 6.2. Job Environment Directives
  18. 6.2.1. Interactive Batch Shell
  19. 6.2.2. Export All Variables
  20. 6.2.3. Export Specific Variables
  21. 6.3. Reporting Directives
  22. 6.3.1. Redirecting Stdout and Stderr
  23. 6.3.2. Setting up E-mail Alerts
  24. 6.4. Job Dependency Directives
  25. 7. Environment Variables
  26. 7.1. PBS Environment Variables
  27. 7.2. Other Important Environment Variables
  28. 8. Example Scripts
  29. 8.1. MPI Script
  30. 8.2. MPI Script (accessing more memory per process)
  31. 8.3. OpenMP Script
  32. 8.4. SHMEM Script
  33. 8.5. Hybrid MPI/OpenMP Script
  34. 8.6. Hybrid MPI/OpenMP Script (Alternative Example)
  35. 1. Introduction
  36.  
  37. On large-scale computers, many users must share available resources. Because of this, you cannot just log on to one of these systems, upload your programs, and start running them. Essentially, your programs (called batch jobs) have to "get in line" and wait their turn. And, there is more than one of these lines (called queues) from which to choose. Some queues have a higher priority than others (like the express checkout at the grocery store). The queues available to you are determined by the projects that you are involved with.
  38.  
  39. The jobs in the queues are managed and controlled by a batch queuing system, without which, users could overload systems, resulting in tremendous performance degradation. The queuing system will run your job as soon as it can while still honoring the following:
  40.  
  41. Meeting your resource requests
  42. Not overloading systems
  43. Running higher priority jobs first
  44. Maximizing overall throughput
  45. We use the PBS Professional queuing system. The PBS module should be loaded automatically for you at login, allowing you access to the PBS commands.
  46.  
  47. 2. Anatomy of a Batch Script
  48.  
  49. A batch script is simply a small text file that can be created with a text editor such as vi or notepad. You may create your own from scratch, or start with one of the sample batch scripts available in $SAMPLES_HOME. Although the specifics of a batch script will differ slightly from system to system, a basic set of components are always required, and a few components are just always good ideas. The basic components of a batch script must appear in the following order:
  50.  
  51. Specify Your Shell
  52. Required PBS Directives
  53. The Execution Block
  54. IMPORTANT: Not all applications on Linux systems can read DOS-formatted text files. PBS does not handle ^M characters well, nor do some compilers. To avoid complications, please remember to convert all DOS-formatted ASCII text files with the dos2unix utility before use on any HPC system. Users are also cautioned against relying on ASCII transfer mode to strip these characters, as some file transfer tools do not perform this function.
  55.  
  56. 2.1. Specify Your Shell
  57.  
  58. First of all, remember that your batch script is a script. It's a good idea to specify which shell your script is written in. Unless you specify otherwise, PBS will use your default login shell to run your script. To tell PBS which shell to use, start your script with a line similar to the following, where shell is either bash, sh, ksh, csh, or tcsh:
  59.  
  60. #!/bin/shell
  61.  
  62. 2.2. Required PBS Directives
  63.  
  64. The next block of your script will tell PBS about the resources that your job needs by including PBS directives. These directives are actually a special form of comment, beginning with "#PBS". As you might suspect, the # character tells the shell to ignore the line, but PBS reads these directives and uses them to set various values. IMPORTANT!! All PBS directives MUST come before the first line of executable code in your script, otherwise they will be ignored.
  65.  
  66. Every script must include directives for the following:
  67.  
  68. The number of nodes and processes per node you are requesting
  69. The maximum amount of time your job should run
  70. Which queue you want your job to run in
  71. Your Project ID
  72. The specific name of your application
  73. PBS also provides additional optional directives. These are discussed in Optional PBS Directives, below.
  74.  
  75. 2.2.1. Number of Nodes and Processes per Node
  76.  
  77. Before PBS can schedule your job, it needs to know how many nodes you want. Before your job can be run, it will also need to know how many processes you want to run on each of those nodes. In general, you would specify one process per core, but you might want more or fewer processes depending on the programming model you are using. See Example Scripts (below) for alternate use cases.
  78.  
  79. Both the number of nodes and processes per node are specified using the same directive as follows, where N1 is the number of nodes you are requesting and N2 is the number of MPI processes per node:
  80.  
  81. #PBS -l select=N1:ncpus=44:mpiprocs=N2
  82.  
  83. The value of ncpus refers to the number of physical cores available on each node. Standard compute nodes on Onyx will require ncpus=44. The value of ncpus must be evenly divisible by N2, therefore N2 must be 1, 2, 4, 11, 22, or 44.
  84.  
  85. GPU nodes will require ncpus=22. N2 must be 1, 2, 11, or 22, plus the extra argument of ngpus=1:
  86.  
  87. #PBS -l select=N1:ncpus=22:mpiprocs=N2:ngpus=1
  88.  
  89. Large-memory nodes will require ncpus=44. N2 must be 1, 2, 4, 11, 22, or 44, plus the extra argument of bigmem=1:
  90.  
  91. #PBS -l select=N1:ncpus=44:mpiprocs=N2:bigmem=1
  92.  
  93. Knights Landing (KNL) nodes will require ncpus=64. N2 must be 1, 2, 4, 8, 16, 32, or 64, plus the extra argument of nmics=1:
  94.  
  95. #PBS -l select=N1:ncpus=64:mpiprocs=N2:nmics=1
  96.  
  97. An exception to this rule is the transfer queue, which uses the directive below:
  98.  
  99. #PBS -l select=1:ncpus=1
  100.  
  101. 2.2.2. How Long to Run
  102.  
  103. Next, PBS needs to know how long your job will run. For this, you will have to make an estimate. There are three things to keep in mind.
  104.  
  105. Your estimate is a limit. If your job hasn't completed within your estimate, it will be terminated.
  106. Your estimate will affect how long your job waits in the queue. In general, shorter jobs will run before longer jobs.
  107. Each queue has a maximum time limit. You cannot request more time than the queue allows.
  108. To specify how long your job will run, include the following directive:
  109.  
  110. #PBS -l walltime=HHH:MM:SS
  111.  
  112. 2.2.3. Which Queue to Run In
  113.  
  114. Now, PBS needs to know which queue you want your job to run in. Your options here are determined by your project. Most users only have access to the debug, standard, and background queues. Other queues exist, but access to these queues is restricted to projects that have been granted special privileges due to urgency or importance, and they will not be discussed here. As their names suggest, the standard and debug queues should be used for normal day-to-day and debugging jobs. The background queue, however, is a bit special because although it has the lowest priority, jobs that run in this queue are not charged against your project allocation. Users may choose to run in the background queue for several reasons:
  115.  
  116. You don't care how long it takes for your job to begin running.
  117. You are trying to conserve your allocation.
  118. You have used up your allocation.
  119. To see the list of queues available on the system, use the show_queues command. To specify the queue you want your job to run in, include the following directive:
  120.  
  121. #PBS -q queue_name
  122.  
  123. 2.2.4. Your Project ID
  124.  
  125. PBS now needs to know which project ID to charge for your job. You can use the show_usage command to find the projects that are available to you and their associated project IDs. In the show_usage output, project IDs appear in the column labeled "Subproject." Note: Users with access to multiple projects should remember that the project they specify may limit their choice of queues.
  126.  
  127. To specify the Project ID for your job, include the following directive:
  128.  
  129. #PBS -A Project_ID
  130.  
  131. 2.3. The Execution Block
  132.  
  133. Once the PBS directives have been supplied, the execution block may begin. This is the section of your script that contains the actual work to be done. A well written execution block will generally contain the following stages:
  134.  
  135. Environment Setup - This might include setting environment variables, loading modules, creating directories, copying files, initializing data, etc. As the last step in this stage, you will generally cd to the directory that you want your script to execute in. Otherwise, your script would execute by default in your home directory. Most users use "cd $PBS_O_WORKDIR" to run the batch script from the directory where they typed "qsub" to submit the job.
  136. Compilation - You may need to compile your application if you don't already have a pre-compiled executable available.
  137. Launching - Your application is launched using the aprun command for CRAY MPICH2 codes and ccmrun for any serial, shared-memory, or non-native MPI codes.
  138. Clean up - This usually includes archiving your results and removing temporary files and directories.
  139. 3. Submitting Your Job
  140.  
  141. Once your batch script is complete, you will need to submit it to PBS for execution using the qsub command. For example, if you have saved your script into a text file named run.pbs, you would type "qsub run.pbs".
  142.  
  143. Occasionally you may want to supply one or more directives directly on the qsub command line. Directives supplied in this way override the same directives if they are already included in your script. The syntax to supply directives on the command line is the same as within a script except that #PBS is not used. For example:
  144.  
  145. qsub -l walltime=HHH:MM:SS run.pbs
  146.  
  147. 4. Simple Batch Script Example
  148.  
  149. The batch script below contains all of the required directives and common script components discussed above. This example starts 88 processes. Each onyx node has 44 cores, so 88 processes require 2 nodes. The job is submitted to the standard queue to run for at most 12 hours.
  150.  
  151. #!/bin/bash
  152. ## Required PBS Directives --------------------------------------
  153. #PBS -A Project_ID
  154. #PBS -q standard
  155. #PBS -l select=2:ncpus=44:mpiprocs=44
  156. #PBS -l walltime=12:00:00
  157. #PBS -l application=Application_Name
  158. #PBS -j oe
  159.  
  160. ## Execution Block ----------------------------------------------
  161. # Environment Setup
  162. # cd to your scratch directory in /work
  163. cd ${WORKDIR}
  164.  
  165. # create a job-specific subdirectory based on JOBID and cd to it
  166. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  167. if [ ! -d ${JOBID} ]; then
  168. mkdir -p ${JOBID}
  169. fi
  170. cd ${JOBID}
  171.  
  172. ## Launching -----------------------------------------------------
  173. # copy executable from $HOME and submit it
  174. cp ${HOME}/mpicode.x .
  175. # The following two lines provide an example of setting up and running
  176. # a CRAY MPICH parallel code built with the default Cray compiler.
  177. aprun -n 88 ./mpicode.x > out.dat
  178.  
  179. # The following two lines provide an example of setting up and running
  180. # a CRAY MPICH parallel code built with the gcc compiler.
  181. module swap PrgEnv-cray PrgEnv-gnu
  182. aprun -n 88 ./mpicode.x > out.dat
  183.  
  184. # The following two lines provide an example of setting up and running
  185. # a CRAY MPICH parallel code built with the INTEL compiler.
  186. module swap PrgEnv-cray PrgEnv-intel
  187. aprun -n 88 ./mpicode.x > out.dat
  188.  
  189. ## Clean up -----------------------------------------------------
  190. # Remove temporary files
  191. rm *.o *.temp
  192. 5. Job Management Commands
  193.  
  194. The table below contains commands for managing your jobs in PBS.
  195.  
  196. Job Management Commands
  197. Command Description
  198. qsub Submit a job.
  199. qstat Check the status of a job.
  200. qview A more user-friendly version of qstat.
  201. qstat -q Display the status of all PBS queues.
  202. show_queues A more user-friendly version of "qstat -q".
  203. qdel Delete a job.
  204. qhold Place a job on hold.
  205. qrls Release a job from hold.
  206. tracejob Display job accounting data from a completed job.
  207. pbsnodes Display host status of all PBS batch nodes.
  208. apstat Display attributes of and resources allocated to running jobs.
  209. qpeek Lets you peek at the stdout and stderr of your running job.
  210. qhist Display a detailed history of a specific job.
  211. 6. Optional PBS Directives
  212.  
  213. In addition to the required directives mentioned above, PBS has many other directives, but most users will only use a few of them. Some of the more useful optional directives are listed below.
  214.  
  215. 6.1. Job Identification Directives
  216.  
  217. Job identification directives allow you to identify characteristics of your jobs. These directives are voluntary, but strongly encouraged. The following table contains some useful job identification directives.
  218.  
  219. Job Identification Directives
  220. Directive Value Description
  221. -l application Application Name Identify the application being used.
  222. -N Job Name Name your job.
  223. 6.1.1. Application Name
  224.  
  225. The "-l application" directive allows you to identify the application being used by your job. This helps the program to accurately assess application usage and to ensure that adequate software licenses and appropriate software are purchased. To use this directive, add a line in the following form to your batch script:
  226.  
  227. #PBS -l application=Application_name
  228.  
  229. Or to your qsub command
  230.  
  231. qsub -l application=Application_name run.pbs
  232.  
  233. A list of application names for use with this directive can be found in $SAMPLES_HOME/Application_Name/application_names on each HPC system.
  234.  
  235. 6.1.2. Job Name
  236.  
  237. The "-N" directive allows you to designate a name for your job. In addition to being easier to remember than a numeric job ID, the PBS environment variable, $PBS_JOBNAME, inherits this value and can be used instead of the job ID to create job-specific output directories. To use this directive, add a line in the following form to your batch script:
  238.  
  239. #PBS -N job_20
  240. Or to your qsub command
  241. qsub -N job_20 run.pbs
  242.  
  243. 6.2. Job Environment Directives
  244.  
  245. Job environment directives allow you to control the environment in which your script will operate. The following table contains a few useful job environment directives.
  246.  
  247. Job Environment Directives
  248. Directive Value Description
  249. -I Request an interactive batch shell.
  250. -V Export all environment variables to the job.
  251. -v Variable List Export specific environment variables to the job.
  252. 6.2.1. Interactive Batch Shell
  253.  
  254. The "-I" directive allows you to request an interactive batch shell. Within that shell, you can perform normal Unix commands, including launching parallel jobs. To use "-I", append it to the end of your qsub request. You may also use the "-X" option to allow for X-Forwarding to run X-Windows-based Graphical interfaces on the compute node, such as the TotalView debugger. For example:
  255.  
  256. qsub -A Project_ID -q debug -l select=2:ncpus=44:mpiprocs=44 -l walltime=1:00:00 -X -I
  257.  
  258. 6.2.2. Export All Variables
  259.  
  260. The "-V" directive tells PBS to export all of the environment variables from your login environment into your batch environment. To use this directive, add a line in the following form to your batch script:
  261.  
  262. #PBS -V
  263. Or to your qsub command
  264. qsub -V run.pbs
  265.  
  266. 6.2.3. Export Specific Variables
  267.  
  268. The "-v" directive tells PBS to export specific environment variables from your login environment into your batch environment. To use this directive, add a line in one of the following forms to your batch script:
  269.  
  270. #PBS -v my_variable
  271. Or to your qsub command
  272. qsub -v my_variable run.pbs
  273.  
  274. Using either of these methods, multiple comma-separated variables can be included. It is also possible to set values for variables exported in this way, as follows:
  275.  
  276. qsub -v my_variable1,my_variable2 run.pbs
  277.  
  278. 6.3. Reporting Directives
  279.  
  280. Reporting directives allow you to control what happens to standard output and standard error messages generated by your script. They also allow you to specify e-mail options to be executed at the beginning and end of your job.
  281.  
  282. 6.3.1. Redirecting Stdout and Stderr
  283.  
  284. By default, messages written to stdout and stderr are captured for you in files named x.ojob_id and x.ejob_id, respectively, where x is either the name of the script or the name specified with the "-N" directive, and job_id is the ID of the job. If you want to change this behavior, the "-o" and "-e" directives allow you to redirect stdout and stderr messages to different named files. The "-j" directive allows you to combine stdout and stderr into the same file.
  285.  
  286. Redirection Directives
  287. Directive Value Description
  288. -e File Name Redirect stderr to the named file.
  289. -o File Name Redirect stdout to the named file.
  290. -j oe Merge stderr and stdout into stdout.
  291. -j eo Merge stderr and stdout into stderr.
  292. 6.3.2. Setting up E-mail Alerts
  293.  
  294. Many users want to be notified when their jobs begin and end. The "-m" directive makes this possible. If you use this directive, you will also need to supply the "-M" directive with one or more e-mail addresses to be used.
  295.  
  296. E-mail Directives
  297. Directive Value Description
  298. -m b Send e-mail when the job begins.
  299. -m e Send e-mail when the job ends.
  300. -M E-mail Address(es) Send e-mail to address(es).
  301. For example:
  302.  
  303. #PBS -m be
  304. #PBS -M joesmith@gmail.com,joe.smith@us.army.mil
  305. 6.4. Job Dependency Directives
  306.  
  307. Job dependency directives allow you to specify dependencies that your job may have on other jobs. This allows users to control the order jobs run in. These directives will generally take the following form:
  308.  
  309. #PBS -W depend=dependency_expression
  310.  
  311. where dependency_expression is a comma-delimited list of one or more dependencies, and each dependency is of the form:
  312.  
  313. type:jobids
  314.  
  315. where type is one of the directives listed below, and jobids is a colon-delimited list of one or more job IDs that your job is dependent upon.
  316.  
  317. Job Dependency Directives
  318. Directive Description
  319. after Execute this job after listed jobs have begun.
  320. afterok Execute this job after listed jobs have terminated without error.
  321. afternotok Execute this job after listed jobs have terminated with an error.
  322. afterany Execute this job after listed jobs have terminated for any reason.
  323. before Listed jobs may be run after this job begins execution.
  324. beforeok Listed jobs may be run after this job terminates without error.
  325. beforenotok Listed jobs may be run after this job terminates with an error.
  326. beforeany Listed jobs may be run after this job terminates for any reason.
  327. For example, run a job after completion (success or failure) of job ID 1234:
  328.  
  329. #PBS -W depend=afterany:1234
  330. Or, run a job after successful completion of job ID 1234:
  331.  
  332. #PBS -W depend=afterok:1234
  333. For more information about job dependencies, see the qsub man page.
  334.  
  335. 7. Environment Variables
  336.  
  337. 7.1. PBS Environment Variables
  338.  
  339. While there are many PBS environment variables, you only need to know a few important ones to get started using PBS. The table below lists the most important PBS environment variables and how you might generally use them.
  340.  
  341. Frequently Used PBS Environment Variables
  342. PBS Variable Description
  343. $PBS_JOBID Job identifier assigned to job or job array by the batch system.
  344. $PBS_O_WORKDIR The absolute path of directory where qsub was executed.
  345. $PBS_JOBNAME The job name supplied by the user.
  346. The following additional PBS variables may be useful to some users.
  347.  
  348. Other PBS Environment Variables
  349. PBS Variable Description
  350. $PBS_ARRAY_INDEX Index number of subjob in job array.
  351. $PBS_ENVIRONMENT Indicates job type: PBS_BATCH or PBS_INTERACTIVE
  352. $PBS_NODEFILE Filename containing a list of vnodes assigned to the job.
  353. $PBS_O_HOST Host name on which the qsub command was executed.
  354. $PBS_O_PATH Value of PATH from submission environment.
  355. $PBS_O_SHELL Value of SHELL from submission environment.
  356. $PBS_QUEUE The name of the queue from which the job is executed.
  357. 7.2. Other Important Environment Variables
  358.  
  359. In addition to the PBS environment variables, the table below lists a few other variables which are not specifically associated with PBS. These variables are not generally required, but may be important depending on your job.
  360.  
  361. Other Important Environment Variables
  362. Variable Description
  363. $OMP_NUM_THREADS The number of OpenMP threads per node.
  364. $MPI_DSM_DISTRIBUTE Ensures that memory is assigned closest to the physical core where each MPI process is running.
  365. 8. Example Scripts
  366.  
  367. All of the script examples shown below contain a "Cleanup" section which demonstrates how to automatically archive your data using the transfer queue and clean up your $WORKDIR after your job completes. Using this method helps to avoid data loss, and ensures that your allocation is not charged for idle cores while performing file transfer operations.
  368.  
  369. 8.1. MPI Script
  370.  
  371. The following script is for a 176 core MPI job running for 20 hours in the standard queue. To run a 176 core job, we need 4 nodes with 44 cores each.
  372.  
  373. #!/bin/ksh
  374. ## Required Directives ------------------------------------
  375. #PBS -l select=4:ncpus=44:mpiprocs=44
  376. #PBS -l walltime=20:00:00
  377. #PBS -q standard
  378. #PBS -A Project_ID
  379. #PBS -l application=Application_Name
  380.  
  381. ## Optional Directives ------------------------------------
  382. #PBS -N testjob
  383. #PBS -j oe
  384. #PBS -M my_email@yahoo.com
  385. #PBS -m be
  386.  
  387. ## Execution Block ----------------------------------------
  388. # Environment Setup
  389. # the following environment variable is not required, but will
  390. # optimally assign processes to cores and improve memory use.
  391. export MPI_DSM_DISTRIBUTE=yes
  392.  
  393. # cd to your scratch directory in /work
  394. cd ${WORKDIR}
  395.  
  396. # create a job-specific subdirectory based on JOBID and cd to it
  397. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  398. if [ ! -d ${JOBID} ]; then
  399. mkdir -p ${JOBID}
  400. fi
  401. cd ${JOBID}
  402.  
  403. # stage input data from archive
  404. archive get -C ${ARCHIVE_HOME}/my_data_dir "*.dat"
  405.  
  406. # copy the executable from $HOME
  407. cp ${HOME}/my_prog.exe .
  408.  
  409. ## Launching ----------------------------------------------
  410. aprun -n 176 ./my_prog.exe > my_prog.out
  411.  
  412. ## Cleanup ------------------------------------------------
  413. # archive your results
  414. # Using the "here document" syntax, create a job script
  415. # for archiving your data.
  416. cd ${WORKDIR}
  417. rm -f archive_job
  418. cat >archive_job <<END
  419. #!/bin/bash
  420. #PBS -l walltime=12:00:00
  421. #PBS -q transfer
  422. #PBS -A Project_ID
  423. #PBS -l select=1:ncpus=1
  424. #PBS -l application=transfer
  425. #PBS -j oe
  426. #PBS -S /bin/bash
  427. cd ${WORKDIR}/${JOBID}
  428. archive mkdir -C ${ARCHIVE_HOME} ${JOBID}
  429. archive put -C ${ARCHIVE_HOME}/${JOBID} *.out
  430. archive ls ${ARCHIVE_HOME}/${JOBID}
  431.  
  432. # Remove scratch directory from the file system.
  433. cd ${WORKDIR}
  434. rm -rf ${JOBID}
  435. END
  436.  
  437. # Submit the archive job script.
  438. qsub archive_job
  439. 8.2. MPI Script (accessing more memory per process)
  440.  
  441. By default, an MPI job runs one process per core, with all processes sharing the available memory on the node. If you need more memory per process, then your job needs to run fewer MPI processes per node.
  442.  
  443. The following script requests 4 nodes (176 cores), but uses only one core per node. This starts 4 MPI processes, each with access to about 121 GBytes of memory. The job runs for 20 hours in the standard queue.
  444.  
  445. Note the use of the "-B" option of aprun, which directs aprun to get the total number of processes ("-n") and the number of processes per node ("-N") from the PBS directives. In this example, "aprun -B" replaces "aprun -n 4 -N 1".
  446.  
  447. #!/bin/ksh
  448. ## Required Directives ------------------------------------
  449. #PBS -l select=4:ncpus=44:mpiprocs=1
  450. #PBS -l walltime=20:00:00
  451. #PBS -q standard
  452. #PBS -A Project_ID
  453. #PBS -l application=Application_Name
  454.  
  455. ## Optional Directives ------------------------------------
  456. #PBS -N testjob
  457. #PBS -j oe
  458. #PBS -M my_email@yahoo.com
  459. #PBS -m be
  460.  
  461. ## Execution Block ----------------------------------------
  462. # Environment Setup
  463. # the following environment variable is not required, but will
  464. # optimally assign processes to cores and improve memory use.
  465. export MPI_DSM_DISTRIBUTE=yes
  466.  
  467. # cd to your scratch directory in /work
  468. cd ${WORKDIR}
  469.  
  470. # create a job-specific subdirectory based on JOBID and cd to it
  471. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  472. if [ ! -d ${JOBID} ]; then
  473. mkdir -p ${JOBID}
  474. fi
  475. cd ${JOBID}
  476.  
  477. # stage input data from archive
  478. archive get -C ${ARCHIVE_HOME}/my_data_dir "*.dat"
  479.  
  480. # copy the executable from $HOME
  481. cp ${HOME}/my_prog.exe .
  482.  
  483. ## Launching ----------------------------------------------
  484. aprun -B ./my_prog.exe > my_prog.out
  485.  
  486. ## Cleanup ------------------------------------------------
  487. # archive your results
  488. # Using the "here document" syntax, create a job script
  489. # for archiving your data.
  490. cd ${WORKDIR}
  491. rm -f archive_job
  492. cat >archive_job <<END
  493. #!/bin/bash
  494. #PBS -l walltime=12:00:00
  495. #PBS -q transfer
  496. #PBS -A Project_ID
  497. #PBS -l select=1:ncpus=1
  498. #PBS -l application=transfer
  499. #PBS -j oe
  500. #PBS -S /bin/bash
  501. cd ${WORKDIR}/${JOBID}
  502. archive mkdir -C ${ARCHIVE_HOME} ${JOBID}
  503. archive put -C ${ARCHIVE_HOME}/${JOBID} *.out
  504. archive ls ${ARCHIVE_HOME}/${JOBID}
  505. # Remove scratch directory from the file system.
  506. cd ${WORKDIR}
  507. rm -rf ${JOBID}
  508. END
  509.  
  510. # Submit the archive job script.
  511. qsub archive_job
  512. 8.3. OpenMP Script
  513.  
  514. The following script is for an OpenMP job using one thread per core on a single node and running for 20 hours in the standard queue. The number of OpenMP threads is set using the $OMP_NUM_THREADS environment variable and the aprun "-d" option. Note the use of the $BC_CORES_PER_NODE environment variable to set the values of both. To start fewer than 44 threads, replace $BC_CORES_PER_NODE with a lower value.
  515.  
  516. #!/bin/ksh
  517. ## Required Directives ------------------------------------
  518. #PBS -l select=1:ncpus=44:mpiprocs=44
  519. #PBS -l walltime=20:00:00
  520. #PBS -q standard
  521. #PBS -A Project_ID
  522. #PBS -l application=Application_Name
  523.  
  524. ## Optional Directives ------------------------------------
  525. #PBS -N testjob
  526. #PBS -j oe
  527. #PBS -M my_email@yahoo.com
  528. #PBS -m be
  529.  
  530. ## Execution Block ----------------------------------------
  531. # Environment Setup
  532. # cd to your scratch directory in /work
  533. cd ${WORKDIR}
  534.  
  535. # create a job-specific subdirectory based on JOBID and cd to it
  536. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  537. if [ ! -d ${JOBID} ]; then
  538. mkdir -p ${JOBID}
  539. fi
  540. cd ${JOBID}
  541.  
  542. # stage input data from archive
  543. archive get -C ${ARCHIVE_HOME}/my_data_dir "*.dat"
  544.  
  545. # copy the executable from $HOME
  546. cp ${HOME}/my_prog.exe .
  547.  
  548. ## Launching ----------------------------------------------
  549. # If you want to start fewer threads, replace $BC_CORES_PER_NODE in
  550. # this section with a lower value.
  551. export OMP_NUM_THREADS=${BC_CORES_PER_NODE}
  552. aprun -n 1 -d ${BC_CORES_PER_NODE} ./my_prog.exe > my_prog.out
  553.  
  554. ## Cleanup ------------------------------------------------
  555. # archive your results
  556. # Using the "here document" syntax, create a job script
  557. # for archiving your data.
  558. cd ${WORKDIR}
  559. rm -f archive_job
  560. cat >archive_job <<END
  561. #!/bin/bash
  562. #PBS -l walltime=12:00:00
  563. #PBS -q transfer
  564. #PBS -A Project_ID
  565. #PBS -l select=1:ncpus=1
  566. #PBS -l application=transfer
  567. #PBS -j oe
  568. #PBS -S /bin/bash
  569. cd ${WORKDIR}/${JOBID}
  570. archive mkdir -C ${ARCHIVE_HOME} ${JOBID}
  571. archive put -C ${ARCHIVE_HOME}/${JOBID} *.out
  572. archive ls ${ARCHIVE_HOME}/${JOBID}
  573. # Remove scratch directory from the file system.
  574. cd ${WORKDIR}
  575. rm -rf ${JOBID}
  576. END
  577.  
  578. # Submit the archive job script.
  579. qsub archive_job
  580. 8.4. SHMEM Script
  581.  
  582. The following script is for a 176 core SHMEM job running for 20 hours in the standard queue. The script requests 4 nodes, with 44 cores each, for a total of 176 cores. The aprun "-B" option directs aprun to get the total number of processes from the PBS directives. In this example, "aprun -B" replaces "aprun -n 176".
  583.  
  584. #!/bin/ksh
  585. ## Required Directives ------------------------------------
  586. #PBS -l select=4:ncpus=44:mpiprocs=44
  587. #PBS -l walltime=20:00:00
  588. #PBS -q standard
  589. #PBS -A Project_ID
  590. #PBS -l application=Application_Name
  591.  
  592. ## Optional Directives ------------------------------------
  593. #PBS -N testjob
  594. #PBS -j oe
  595. #PBS -M my_email@yahoo.com
  596. #PBS -m be
  597.  
  598. ## Execution Block ----------------------------------------
  599. # Environment Setup
  600. # the following environment variable is not required but will
  601. # optimally assign processes to cores and improve memory use.
  602. export MPI_DSM_DISTRIBUTE=yes
  603.  
  604. # cd to your scratch directory in /work
  605. cd ${WORKDIR}
  606.  
  607. # create a job-specific subdirectory based on JOBID and cd to it
  608. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  609. if [ ! -d ${JOBID} ]; then
  610. mkdir -p ${JOBID}
  611. fi
  612. cd ${JOBID}
  613.  
  614. # stage input data from archive
  615. archive get -C ${ARCHIVE_HOME}/my_data_dir "*.dat"
  616.  
  617. # copy the executable from $HOME
  618. cp ${HOME}/my_prog.exe .
  619.  
  620. ## Launching ----------------------------------------------
  621. aprun -B ./my_prog.exe > my_prog.out
  622.  
  623. ## Cleanup ------------------------------------------------
  624. # archive your results
  625. # Using the "here document" syntax, create a job script
  626. # for archiving your data.
  627. cd ${WORKDIR}
  628. rm -f archive_job
  629. cat >archive_job <<END
  630. #!/bin/bash
  631. #PBS -l walltime=12:00:00
  632. #PBS -q transfer
  633. #PBS -A Project_ID
  634. #PBS -l select=1:ncpus=1
  635. #PBS -l application=transfer
  636. #PBS -j oe
  637. #PBS -S /bin/bash
  638. cd ${WORKDIR}/${JOBID}
  639. archive mkdir -C ${ARCHIVE_HOME} ${JOBID}
  640. archive put -C ${ARCHIVE_HOME}/${JOBID} *.out
  641. archive ls ${ARCHIVE_HOME}/${JOBID}
  642. # Remove scratch directory from the file system.
  643. cd ${WORKDIR}
  644. rm -rf ${JOBID}
  645. END
  646.  
  647. # Submit the archive job script.
  648. qsub archive_job
  649. 8.5. Hybrid MPI/OpenMP Script
  650.  
  651. The following script uses 4 nodes (176 cores) placing one MPI process per node and 44 OpenMP threads per node (one per core). The aprun "-B" option directs aprun to get the total number of MPI processes from the PBS "select" directive. In this example, "aprun -B" replaces "aprun -n ${BC_NODE_ALLOC} -d ${BC_CORES_PER_NODE}" or "aprun -n 4 -d 44".
  652.  
  653. #!/bin/ksh
  654. ## Required Directives ------------------------------------
  655. #PBS -l select=4:ncpus=44:mpiprocs=1
  656. #PBS -l walltime=20:00:00
  657. #PBS -q standard
  658. #PBS -A Project_ID
  659. #PBS -l application=Application_Name
  660.  
  661. ## Optional Directives ------------------------------------
  662. #PBS -N testjob
  663. #PBS -j oe
  664. #PBS -M my_email@yahoo.com
  665. #PBS -m be
  666.  
  667. ## Execution Block ----------------------------------------
  668. # Environment Setup
  669. # the following environment variable is not required, but will
  670. # optimally assign processes to cores and improve memory use.
  671. export MPI_DSM_DISTRIBUTE=yes
  672.  
  673. # cd to your scratch directory in /work
  674. cd ${WORKDIR}
  675.  
  676. # create a job-specific subdirectory based on JOBID and cd to it
  677. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  678. if [ ! -d ${JOBID} ]; then
  679. mkdir -p ${JOBID}
  680. fi
  681. cd ${JOBID}
  682.  
  683. # stage input data from archive
  684. archive get -C ${ARCHIVE_HOME}/my_data_dir "*.dat"
  685.  
  686. # copy the executable from $HOME
  687. cp ${HOME}/my_prog.exe .
  688.  
  689. ## Launching ----------------------------------------------
  690. export OMP_NUM_THREADS=${BC_CORES_PER_NODE}
  691. aprun -B ./my_prog.exe > my_prog.out
  692.  
  693. ## Cleanup ------------------------------------------------
  694. # archive your results
  695. # Using the "here document" syntax, create a job script
  696. # for archiving your data.
  697. cd ${WORKDIR}
  698. rm -f archive_job
  699. cat >archive_job <<END
  700. #!/bin/bash
  701. #PBS -l walltime=12:00:00
  702. #PBS -q transfer
  703. #PBS -A Project_ID
  704. #PBS -l select=1:ncpus=1
  705. #PBS -l application=transfer
  706. #PBS -j oe
  707. #PBS -S /bin/bash
  708. cd ${WORKDIR}/${JOBID}
  709. archive mkdir -C ${ARCHIVE_HOME} ${JOBID}
  710. archive put -C ${ARCHIVE_HOME}/${JOBID} *.out
  711. archive ls ${ARCHIVE_HOME}/${JOBID}
  712. # Remove scratch directory from the file system.
  713. cd ${WORKDIR}
  714. rm -rf ${JOBID}
  715. END
  716.  
  717. # Submit the archive job script.
  718. qsub archive_job
  719. 8.6. Hybrid MPI/OpenMP Script (Alternative Example)
  720.  
  721. The following script uses 8 nodes (352 cores) placing two MPI processes per node and 44 OpenMP threads per node (22 for each MPI process). Note the use of the $BC_CORES_PER_NODE, $BC_NODE_ALLOC, and $BC_MPI_TASKS_ALLOC environment variables. The number of threads per MPI process is automatically computed by dividing the number of cores per node by the number of MPI processes per node. The number of MPI processes per node is the total number of MPI processes divided by the number of nodes.
  722.  
  723. #!/bin/ksh
  724. ## Required Directives ------------------------------------
  725. #PBS -l select=8:ncpus=44:mpiprocs=2
  726. #PBS -l walltime=20:00:00
  727. #PBS -q standard
  728. #PBS -A Project_ID
  729. #PBS -l application=Application_Name
  730.  
  731. ## Optional Directives ------------------------------------
  732. #PBS -N testjob
  733. #PBS -j oe
  734. #PBS -M my_email@yahoo.com
  735. #PBS -m be
  736.  
  737. ## Execution Block ----------------------------------------
  738. # Environment Setup
  739. # the following environment variable is not required, but will
  740. # optimally assign processes to cores and improve memory use.
  741. export MPI_DSM_DISTRIBUTE=yes
  742.  
  743. # cd to your scratch directory in /work
  744. cd ${WORKDIR}
  745.  
  746. # create a job-specific subdirectory based on JOBID and cd to it
  747. JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
  748. if [ ! -d ${JOBID} ]; then
  749. mkdir -p ${JOBID}
  750. fi
  751. cd ${JOBID}
  752.  
  753. # stage input data from archive
  754. archive get -C ${ARCHIVE_HOME}/my_data_dir "*.dat"
  755.  
  756. # copy the executable from $HOME
  757. cp ${HOME}/my_prog.exe .
  758.  
  759. ## Launching ----------------------------------------------
  760. export OMP_NUM_THREADS=`expr ${BC_CORES_PER_NODE} /
  761. '('${BC_MPI_TASKS_ALLOC} '*' ${BC_NODE_ALLOC}')'`
  762. aprun -B ./my_prog.exe > my_prog.out
  763.  
  764. ## Cleanup ------------------------------------------------
  765. # archive your results
  766. # Using the "here document" syntax, create a job script
  767. # for archiving your data.
  768. cd ${WORKDIR}
  769. rm -f archive_job
  770. cat >archive_job <<END
  771. #!/bin/bash
  772. #PBS -l walltime=12:00:00
  773. #PBS -q transfer
  774. #PBS -A Project_ID
  775. #PBS -l select=1:ncpus=1
  776. #PBS -l application=transfer
  777. #PBS -j oe
  778. #PBS -S /bin/bash
  779. cd ${WORKDIR}/${JOBID}
  780. archive mkdir -C ${ARCHIVE_HOME} ${JOBID}
  781. archive put -C ${ARCHIVE_HOME}/${JOBID} *.out
  782. archive ls ${ARCHIVE_HOME}/${JOBID}
  783. # Remove scratch directory from the file system.
  784. cd ${WORKDIR}
  785. rm -rf ${JOBID}
  786. END
  787.  
  788. # Submit the archive job script.
  789. qsub archive_job
  790. Another alternative is to set the number of threads using the PBS directive "ompthreads=value". This directive will cause PBS to set the $OMP_NUM_THREADS environment variable for you, so you need not set it in the script.
  791.  
  792. Instead of setting the environment variable yourself, like this:
  793.  
  794. #PBS -l select=8:ncpus=44:mpiprocs=2
  795. ...
  796. export OMP_NUM_THREADS=22
  797.  
  798. Set threads per MPI process in the PBS directive and remove the export line from your script. The $OMP_NUM_THREADS environment variable will be set for you to the value you specify in the PBS directive.
  799.  
  800. #PBS -l select=8:ncpus=44:mpiprocs=2:ompthreads=22
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement