SPLDellHPCCluster-OLD Documentation

From NAMIC Wiki
Jump to: navigation, search
Home < SPLDellHPCCluster-OLD Documentation

Installation Questions for Dell

An overview document prepared by Dell.

User Level

Jobs

  • Special Note
    • If your command line length is over 350 characters, and you're using the mpirun_ssh command provided with Topspin to run the job over the infiniband interconnect, you must place your command line within a script and execute it with the mpirun_ssh command. Please see the cluster overview document for more details.
  • How do you schedule a job?
    • If using LAVA, see the example script located at /var/news/example_lava_batch.sh. edit this file to suit your needs. Documentation is located at Lava Users Guide, page 6 (only available on the SPL network). Then use the 'bsub' command to submit the job to the scheduler:
bsub < your_script_name.sh
  • How do you cancel a job?
    • If using LAVA, type bkill <job_id> where the job_id is the numeric job identifier assigned to your job by LAVA. To find your job_id, type bjobs -l and look for your job.
bkill 102
  • How many jobs can I run simultaneously?
    • The number of jobs you can run at one time depends on the number of CPUs each job requires. A good rule of thumb is to not over-subscribe the cluster, so only execute a number of jobs less than or equal to the number of CPUs in the system. Additionally, one must take into consideration the amount of memory required for each CPU. Using more than 2 GB of RAM per CPU (the amount in the cluster) can cause the systems to start paging (using swap), and severely impact the performance of the cluster.
  • How do you set/change the priority of a scheduled job?
    • Priority of jobs on the system are handled by the batch systems. To setup these priorities, modifications to the batch system (LAVA) must be made to create special priority queues.
  • How do you check your job status?
    • See Lava Users Guide, page 24 (only available on the SPL network). Additionally, you can view the status of your jobs using the LAVA GUI.
  • How do you check the status/load on the cluster?
    • You can see a graphical representation of the cluster load by viewing the Ganglia Home Page (only available on the SPL network) or the Clumon Home Page (only available on the SPL network). From the command line, issue "cluster-fork uptime" and the node uptime and load information will be displayed. It should be noted that each node, as they are dual-processor, is fully subscribed when displaying a load of 2.0.
  • Will the input and output data reside on the SPL filesystem or somewhere else?
    • The head-node of the cluster is attached to a 1.6 TB disk array that provides shared storage for the users home directories. This is typically where your data will reside. If you have specific requirements for your application, you can also copy temporary data to the local filesystems of the compute nodes for processing. However, this may require some configuration changes on the part of your admins.
  • Where will stdout/stderr go?
    • The location of the file to contain the stdout / stderr data is determined by the batch scripts used to submit the jobs to the scheduler you use. Please see the example scripts /var/news/example_lava_batch.sh for more information.
  • Are there working examples for people to start with?
    • Commented example batch scripts for LAVA and PBS/Torque are located under /var/news/example_lava_batch.sh .

Specs

  • What are the specifications of the current cluster?
    • The head node is a dual Xeon 3.6GHz system with 4GB of RAM. The system disks are two 146GB SCSI disks configured as a mirrored RAID. The head node also has a 1.6TB RAID 5 array with a hot spare attached. There are also 50 dual Xeon 3.2GHz compute nodes with 4GB of ram each and 36GB SCSI hard drives, 16GB of which is available as /state/partition1 on the compute nodes. The cluster has an Infiniband interconnect.
  • What are the specifications of the replacement/upgrade cluster, and when will it become available?
  • Are there any interactive capabilities, or do all jobs need to run as batch jobs without a display/controlling terminal?
    • The best practice for operating a cluster in a multi-user environment is to always utilize the schedulers to ensure that user jobs are evenly distributed among the CPUs on the system. However, one can run jobs interactively using the mpirun command.
  • How much disk space is there on the front ends and on the nodes?
    • See the answer to the first question above.

Software

  • What packages are installed?
    • To determine what packages are installed on the cluster, you can login to the head node and type "rpm -qa". This will give a list of all RPMs installed by Dell.
  • What development tools are installed?
    • The GNU and Intel compilers are installed on the system.
  • Compiling and Executing MPI Code
    • If you’re compiling MPI code, there are two sets of libraries on the cluster which you should use. If you’re wanting to use the infinband interconnect, the following paths should be used:
Libraries: /usr/local/topspin/mpi/mpich/lib64
Headers: /usr/local/topspin/mpi/mpich/include
MPIRUN: /usr/local/topspin/mpi/mpich/bin/mpirun_ssh
  • ** If you’re wanting to use the Ethernet interconnect, the flowing paths should be used:
Libraries: /opt/mpich/gnu/lib64
Headers: /opt/mpich/gnu/include
MPIRUN: /opt/mpich/gnu/bin/mpirun

Support

  • What are the on-site user support options? Documentation, mailing lists, website links, etc.?

Administrative level

Normal Operation

  • How do we gracefully shutdown the SPL DELL Cluster?
    • To gracefully shutdown the cluster, first shutdown all of the compute nodes by typing "cluster-fork halt -p" as root on the head node. Once all compute nodes have shut down, type halt -p on the head node. Once the head node has completely powered off, you can turn off the SCSI attached RAID device and ethernet/Infiniband switches.
  • How do we properly power up the SPL DELL Cluster?
    • First, ensure that the SCSI attached RAID device is powered on, and that all switches are powered on. Then power up the head node. Once the head node is completely operational, you can power up the compute nodes.
  • What Routine Maintenance is needed?
    • The same routine mantenance would be required on this cluster as would be required on any Linux system.

Trouble Shooting

  • If the SPL DELL Cluster loses power suddenly, how do we know that it has restarted properly?
    • Power the cluster up using the method described above. Once the cluster has been powered up, issue the following command and look for errors: "cluster-fork uptime". When power loss occurs on the compute nodes of the cluster, the compute nodes will re-install the operating system from the master node. This process can take up to 2 hours to complete.
  • Is there any ongoing status monitoring to alert us of possible hw/sw failures?
    • Platform Rocks does not include any proactive monitoring software capable of sending alerts to you of which I am aware other than a RSS feed which you can monitor. See the Cluster Home Page for more details.

Getting Support from Dell

  • What is the procedure if we are experiencing/seeing issues with the SPL DELL Cluster?
    • Contact Dell support. Once the installation is complete, you will receive a call from your Technical Account manager who will provide more details about support.
    • Who do we go to on-site to do admin tasks/kick daemons, etc.?
    • Who do we go to/who do we call if we are seeing software/hardware issues?

SPL Policy Questions

  • Does the cluster have a name???
    • cluster.bwh.harvard.edu is the fully qualified domain name of the cluster.
  • Who is the adminstrative contact for the cluster?
    • This answer will have to come from Simon Warfield.
  • What are the goals for the cluster?
  • Do all SPL users have access to the cluster?
    • The cluster is currently a stand-alone system. Access will have to be arranged through the local cluster administrators.
  • Are there per-user quotas?
    • Access restrictions and limits may be established via the job batching system.
  • Who is allowed to schedule jobs on the cluster?
    • Answer above applies here also.
  • Is there a way to reserve the cluster for large batch jobs?
    • Higher priority queues can be established via the batching system.
  • What is the process if custom software or upgrades are needed?

Software should be added to compute nodes via the kickstart framework in order to maintain a consistent user environment across the cluster. Please refer to the "Rocks User Guide" on the local cluster web-interface for details on customizing your compute nodes and cluster distribution.

General information

  • About cluster software e.g.:

http://www.dell.com/downloads/global/power/ps4q05-20050227-Ali.pdf

  • About 'Dell High Performance Computing Clusters'

http://www.dell.com/content/topics/global.aspx/solutions/en/clustering_hpcc?c=us&cs=555&l=en&s=biz&~lt=print

  • About Rocks cluster software

http://www.rocksclusters.org

  • About Platform Rocks

http://www.platform.com/Products/Rocks