HP XC System SoftwareUser's GuideVersion 3.1Printed in the USHP Part Number: 5991-7400Published: November 2006
10
Table 10-2 LSF-HPC Equivalents of SLURM srun Options (continued)LSF-HPC EquivalentDescriptionsrun Option-ext "SLURM[constraint=list]"Specifi
Table 10-2 LSF-HPC Equivalents of SLURM srun Options (continued)LSF-HPC EquivalentDescriptionsrun OptionYou cannot use this option. LSF-HPC uses this
Table 10-2 LSF-HPC Equivalents of SLURM srun Options (continued)LSF-HPC EquivalentDescriptionsrun OptionMeaningless under LSF-HPC integrated withSLURM
11 Advanced TopicsThis chapter covers topics intended for the advanced user. This chapter addresses the following topics:• “Enabling Remote Execution
$ hostnamemymachineThen, use the host name of your local machine to retrieve its IP address:$ host mymachinemymachine has address 14.26.206.134Step 2.
First, examine the available nodes on the HP XC system. For example:$ sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite
the rule). Typically the rules for an object file target is a single compilation line, so it is common to talkabout concurrent compilations, though GN
testall: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Making $$i ..."
$ make PREFIX=’srun –n1 –N1 MAKE_J='-j4'11.3.2 Example Procedure 2Go through the directories in parallel and have the make procedure within
The modified Makefile is invoked as follows:$ make PREFIX='srun -n1 -N1' MAKE_J='-j4'11.4 Local Disks on Compute NodesThe use of a
List of Examples5-1 Submitting a Job from the Standard Input...
Verify with your system administrator that MPICH has been installed on your system. The HP XC SystemSoftware Administration Guide provides procedures
IMPORTANT: Be sure that the number of nodes and processors in the bsub command corresponds tothe number specified by the appropriate options in the wr
A ExamplesThis appendix provides examples that illustrate how to build and run applications on the HP XC system.The examples in this section show you
Examine the partition information:$ sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite 6 idle n[5-10]Examine the loca
View the job:$ bjobs -l 8Job <8>, User <smith>, Project <default>, Status <DONE>, Queue <normal>,Interactive mode, Extsc
A.4 Launching a Parallel Interactive Shell Through LSF-HPCThis section provides an example that shows how to launch a parallel interactive shell throu
date and time stamp: Submitted from host <n2>, to Queue <normal>, CWD <$HOME>,4 Processors Requested, Requested Resources <type=a
$ lshostsHOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCESlsfhost.loc SLINUX6 DEFAULT 1.0 8 1M - Yes (slurm)$
$ sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELISTlsf up infinite 4 idle n[13-16]Submit the job:$ bsub -n8 -Ip /bin/shJob <1008
loadSched - - - - - - - - - - - loadStop - - - - - - - - - - -View the finished jobs:$ bhist -
Greetings from process 2! from ( n14 pid 14011)Greetings from process 3! from ( n14 pid 14012)Greetings from process 4! from ( n15 pid 18227)Greetings
If myjob runs on an HP XC host, the SLURM[nodes=4-4] allocation option is applied. If it runs onan Alpha/AXP host, the SLURM option is ignored.• Run m
GlossaryAadministrationbranchThe half (branch) of the administration network that contains all of the general-purposeadministration ports to the nodes
external networknodeA node that is connected to a network external to the HP XC system.Ffairshare An LSF job-scheduling policy that specifies how reso
Integrated LightsOutSee iLO.interconnect A hardware component that provides high-speed connectivity between the nodes in the HPXC system. It is used f
MCS An optional integrated system that uses chilled water technology to triple the standard coolingcapacity of a single rack. This system helps take t
PXE Preboot Execution Environment. A standard client/server interface that enables networkedcomputers that are not yet installed with an operating sys
128
IndexAACML library, 42application development, 37building parallel applications, 42building serial applications, 39communication between nodes, 109com
About This DocumentThis document provides information about using the features and functions of the HP XC System Software.It describes how the HP XC u
compute node, 37configuring local disk, 109core availability, 38CP3000, 20MKL library, 42system interconnect, 22CP3000BL, 20CP4000, 20ACML library, 42
submission, 47submission from non-HP XC host, 55job accounting, 81job allocation informationobtaining, 92job manager, 84job scheduler, 84JOBID transla
Pparallel applicationbuild environment, 40building, 42compiling and linking, 42debugging, 57debugging with TotalView, 57developing, 37environment for
TTotalView, 57debugging an application, 59exiting, 61setting preferences, 59setting up, 58tuning applications, 73Uuser environment, 31utilization metr
Ctrl+x A key sequence. A sequence such as Ctrl+x indicates that you must holddown the key labeled Ctrl while you press another key or mouse button.ENV
See the following sources for information about related HP products.HP XC Program Development EnvironmentThe Program Development Environment home page
— Administering Platform LSF— Administration Primer— Platform LSF Reference— Quick Reference Card— Running Jobs with Platform LSFLSF procedures and in
• http://sourceforge.net/projects/modules/Web site for Modules, which provide for easy dynamic modification of a user's environment throughmodule
Software RAID Web Sites• http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html andhttp://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/pdf/Software-
1 Overview of the User EnvironmentThe HP XC system is a collection of computer nodes, networks, storage, and software, built into a cluster,that work
© Copyright 2003, 2005, 2006 Hewlett-Packard Development Company, L.P.Confidential computer software. Valid license from HP required for possession, u
Table 1-1 Determining the Node PlatformPartial Output of /proc/cpuinfoPlatformprocessor : 0vendor_id : GenuineIntelcpu family : 15 mo
nodes must be launched from nodes with the login role. Nodes with the compute roleare referred to as compute nodes in this manual.1.1.5 Storage and I/
and keeps software from conflicting with user installed software. Files are segregated into the followingtypes and locations:• Software specific to HP
free -mUse the following command to display the amount of free andused memory in megabytes:cat /proc/partitionsUse the following command to display th
SLURM commands HP XC uses the Simple Linux Utility for Resource Management (SLURM) forsystem resource management and job scheduling. Standard SLURM co
1.5.2 Load Sharing Facility (LSF-HPC)The Load Sharing Facility for High Performance Computing (LSF-HPC) from Platform ComputingCorporation is a batch
HP-MPI Determines HOW the job runs. It is part of the application, so it performs communication.HP-MPI can also pinpoint the processor on which each r
2 Using the SystemThis chapter describes the tasks and commands that the general user must know to use the system. Itaddresses the following topics:•
2.2.1 IntroductionAs described in “Run-Time Environment” (page 24), SLURM and LSF-HPC cooperate to run and managejobs on the HP XC system, combining L
For more information about using this command and a sample of its output, see “Getting InformationAbout the LSF Execution Host Node” (page 91) .• The
Table of ContentsAbout This Document...131 Intende
My cluster name is hptclsfMy master name is lsfhost.localdomainIn this example, hptclsf is the LSF cluster name, and lsfhost.localdomain is the name o
3 Configuring Your Environment with ModulefilesThe HP XC system supports the use of Modules software to make it easier to configure and modify theyour
(perhaps with incompatible shared objects) installed, it is probably wise to set MPI_CC (and others) explicitlyto the commands made available by the c
Table 3-1 Supplied Modulefiles (continued)Sets the HP XC User Environment to Use:ModulefileIntel Math Kernel Library.imkl/8.0 (default)Intel Version 7
3.5 Viewing Loaded ModulefilesA loaded modulefile is a modulefile that has been explicitly loaded in your environment by the moduleload command. To vi
3.8 Viewing Modulefile-Specific HelpYou can view help information for any of the modulefiles on the HP XC system. For example, to accessmodulefile-spe
To install a random product or package should look at the manpages for modulefiles, examine the existingmodulefiles, and create a new modulefile for t
4 Developing ApplicationsThis chapter discusses topics associated with developing applications in the HP XC environment. Beforereading this chapter, y
Table 4-1 Compiler CommandsNotesCompilersTypeFortranC++CAll HP XC platforms.The HP XC System Software supplies thesecompilers by default.g77gcc++gccSt
The Ctrl/Z key sequence is ignored.4.5 Setting Debugging OptionsIn general, the debugging information for your application that is needed by most debu
3 Configuring Your Environment with Modulefiles...313.1 Overview of Modules...
For further information about developing parallel applications in the HP XC environment, see the following:• “Launching Jobs with the srun Command” (p
Intel-pthreadPGI-lpgthreadFor example:$ mpicc object1.o ... -pthread -o myapp.exe4.7.1.5 Quadrics SHMEMThe Quadrics implementation of SHMEM runs on HP
Information about using the GNU parallel Make is provided in “Using the GNU Parallel Make Capability”.For further information about using GNU parallel
If you have not already loaded the mpi compiler utilities module , load it now as follows:$ module load mpiTo compile and link a C application using t
For released libraries, dynamic and archive, the usual custom is to have a ../lib directory that containsthe libraries. This, by itself, will work if
NOTE: There is no shortcut as there is for the dynamic loader.4.8 Developing Libraries 45
5 Submitting JobsThis chapter describes how to submit jobs on the HP XC system; it addresses the following topics:• “Overview of Job Submission” (page
launched on LSF-HPC node allocation (compute nodes). LSF-HPC node allocation is created by -nnum-procs parameter, which specifies the number of cores
return 0; }The following is the command line used to compile this program:$ cc hw_hostname.c -o hw_hostnameWhen run on the login node, it shows the
5.4 Submitting a Batch Job or Job Script...535.5 Sub
The SLURM srun command is required to run jobs on an LSF-HPC node allocation. The srun commandis the user job launched by the LSF bsub command. SLURM
Example 5-7 Submitting an MPI Job$ bsub -n4 -I mpirun -srun ./hello_worldJob <24> is submitted to default queue <normal>. <<Waiting
bsub -n num-procs -ext "SLURM[slurm-arguments]" [bsub-options][ -srun[srun-options]] [jobname] [job-options]The slurm-arguments parameter ca
Example 5-11 Using the External Scheduler to Submit a Job That Excludes One or More Nodes$ bsub -n4 -ext "SLURM[nodes=4; exclude=n3]" -I sru
Example 5-14 Submitting a Job Script$ cat myscript.sh#!/bin/sh srun hostname mpirun -srun hellompi$ bsub -I -n4 myscript.shJob <29> is submitted
Example 5-17 Submitting a Batch job Script That Uses the srun --overcommit Option$ bsub -n4 -I ./myscript.shJob <81> is submitted to default que
5.6 Running Preexecution ProgramsA preexecution program is a program that performs needed setup tasks that an application needs. It maycreate director
6 Debugging ApplicationsThis chapter describes how to debug serial and parallel applications in the HP XC developmentenvironment. In general, effectiv
This section provides only minimum instructions to get you started using TotalView. Instructions forinstalling TotalView are included in the HP XC Sys
6.2.1.4 Using TotalView with LSF-HPCHP recommends the use of xterm when debugging an application with LSF-HPC. You also need to allocatethe nodes you
10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment...8710.5.1 Useful Commands...
Use the -g option to enable debugging information.2. Run the application in TotalView:$ mpirun -tv -srun -n2 ./Psimple3. The TotalView main control wi
$ mpicc -g -o Psimple simple.c -lm2. Run the application:$ mpirun -srun -n2 Psimple3. Start TotalView:$ totalview4. Select Unattached in the TotalView
62
7 Monitoring Node ActivityThis chapter describes the optional utilities that provide performance information about the set of nodesassociated with you
Figure 7-1 The xcxclus Utility DisplayThe icons show most node utilization statistics as a percentage of the total resource utilization. For example,F
1The node designator is on the upper left of the icon.2The left portion of the icon represents the Ethernet connection or connections.In this illustra
Figure 7-3 The clusplot Utility DisplayThe clusplot utility uses the GNUplot open source plotting program.7.4 Using the xcxperf Utility to Display Nod
$ xcxperf -o testFigure 7-4 The xcxperf Utility DisplaySpecifying the data file prefix when you invoke the xcxperf utility from the command line plays
Figure 7-5 The perfplot Utility Display7.6 Running Performance Health TestsYou can run the ovp command to generate reports on the performance health o
NOTE: The --nodelist=nodelist option is particularly useful for determiningproblematic nodes.If you use this option and the --nnodes=n option, the --n
List of Figures4-1 Library Directory Structure...
$ ovp --verify=perf_health/cpu_usageXC CLUSTER VERIFICATION PROCEDUREdate timeVerify perf_health: Testing cpu_usage ... +++ PASSED +++This v
Verify perf_health: Testing memory ... Specified nodelist is n[11-15] Number of nodes allocated for this test is 5 Job <103
8 Tuning ApplicationsThis chapter discusses how to tune applications in the HP XC environment.8.1 Using the Intel Trace Collector and Intel Trace Anal
Example 8-1 The vtjacobic Example ProgramFor the purposes of this example, the examples directory under /opt/IntelTrace/ITC is copied to theuser'
8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XCNOTE: The Intel Trace Collector (OTA) was formerly known as VampirTrace. The Intel Trac
Running a ProgramEnsure that the -static-libcxa flag is used when you use mpirun.mpich to launch a C or Fortranprogram.The following is a C example ca
86 Difference is 2.809467246160129E-005 88 Difference is 2.381154327036583E-005 90 Difference is 2.01814296456522
9 Using SLURMHP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource managementand job scheduling.This chapter address
Example 9-1 Simple Launch of a Serial Program$ srun hostname n19.3.1 The srun Roles and ModesThe srun command submits jobs to run under SLURM manageme
Example 9-3 Reporting on Failed Jobs in the Queue$ squeue --state=FAILED JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
# chmod a+r /hptc_cluster/slurm/job/jobacct.logYou can find detailed information on the sacct command and job accounting data in the sacct(1) manpage.
10 Using LSF-HPCThe Load Sharing Facility (LSF-HPC) from Platform Computing Corporation is a batch system resourcemanager used on the HP XC system.On
• The bsub command is used to submit jobs to LSF.• The bjobs command provides information on batch jobs.10.2 Overview of LSF-HPC Integrated with SLURM
10.3 Differences Between LSF-HPC and LSF-HPC Integrated with SLURMLSF-HPC integrated with SLURM for the HP XC environment supports all the standard fe
$ lshostsHOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCESlsfhost.loc SLINUX6 Opteron8 60.0 8 2007M - Yes (slurm)$
Pseudo-parallel job A job that requests only one slot but specifies any of these constraints:• mem• tmp• nodes=1• mincpus > 1Pseudo-parallel jobs a
10.6 Submitting JobsThe bsub command submits jobs to LSF-HPC; it is used to request a set of resources on which to launcha job. This section focuses o
Figure 10-1 How LSF-HPC and SLURM Launch and Manage a JobN 1 6N16User124666677775job_starter.sh$ srun -nl myscriptLogin node$ bsub-n4 -ext ”SLURM[node
List of Tables1-1 Determining the Node Platform...
4LSF-HPC prepares the user environment for the job on the LSF execution host node and dispatchesthe job with the job_starter.sh script. This user envi
LSF-HPC daemons run on only one node in the HP XC system, so the bhosts command will list one host,which represents all the resources of the HP XC sys
In the previous example output, the LSF execution host (lsfhost.localdomain) is listed under theHOST_NAME column. The status is listed as ok, indicati
After LSF-HPC integrated with SLURM allocates nodes for a job, it attaches allocation information to thejob.The bjobs -l command provides job allocati
Example 10-2 Job Allocation Information for a Finished Job$ bhist -l 24Job <24>, User <lsfadmin>, Project <default>,
Example 10-4 Using the bjobs Command (Long Output)$ bjobs -l 24Job <24>, User <msmith>,Project <default>,Status <RUN>,
For detailed information about a finished job, add the -l option to the bhist command, shown inExample 10-6. The -l option specifies that the long for
123 hptclsf@99 lsf 8 RUNNING 0123.0 hptclsf@99 lsf 0 RUNNING 0 In these examples, the job
You can simplify this by first setting the SLURM_JOBID environment variable to the SLURM JOBID in theenvironment, as follows:$ export SLURM_JOBID=150$
$ export SLURM_JOBID=150$ export SLURM_NPROCS=4$ mpirun -tv srun additional parameters as neededAfter you finish with this interactive allocation, exi
Kommentare zu diesen Handbüchern