Pages

To run jobs in specific node in a HPC cluster

Some time, you may want to run a job in a specific folder. For example, to check if all the nodes are working properly after restarting the cluster.

1. Use #PBS -Wall to mention the name

A simplest way is to run as many number of jobs as that of the node at the same time (which can take some time to complete) and use

qstat -n 

to see whether all the nodes are used for the calculation.

To see all the nodes in a cluster, use
pbsnodes -a

cluster
     Mom = headnodename.companyname
     ntype = PBS
     state = free
     pcpus = 24
     resources_available.arch = linux
     resources_available.host = cluster
     resources_available.mem = 264417884kb
     resources_available.ncpus = 24
     resources_available.vnode = cluster
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.netwins = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared

cn1
     Mom = cn2.aracluster
     ntype = PBS
     state = free
     pcpus = 24
     resources_available.arch = linux
     resources_available.host = cn2
     resources_available.mem = 264424324kb
     resources_available.ncpus = 24
     resources_available.vnode = cn2
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.netwins = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared

cn2
     Mom = cn1.aracluster
     ntype = PBS
     state = free
     pcpus = 24
     resources_available.arch = linux
     resources_available.host = cn1
     resources_available.mem = 264424336kb
     resources_available.ncpus = 24
     resources_available.vnode = cn1
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.netwins = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared

Here, cn1, cn2 are the nodenames. The first one 'cluster' is the name of the head node.

Mentions these names in the #PBS option. For example,

#PBS -l nodes=cn1;ncpus=4

This option will run 4 cpus from cn1 node in the cluster.

=======================================================
You can use

cat /etc/hosts

to display the nodenames.

Click here to go back to "Important things to before you work on HPC cluster"


No comments:

Post a Comment

You may be interested in these posts

Error in image file conversion: convert-im6.q16: not authorized `test.eps' @ error/constitute.c/WriteImage/1037.

This error is because of the vulnerability. This allows remote execution of code using image formats. So, some Linux distributions by defaul...