Vanderbilt ACCRE Userguide for CMS Collaborators
Introduction
The Vanderbilt ACCRE computing center provides interactive and Grid based services to a variety of research groups in Vanderbilt as well as collaborations such as CMS. The center provides the shared usage of over 1000 opteron cores in a single cluster. The Grid interfaces are integrated into the
Open Science Grid (OSG). The cluster also contains
PowerPC cores, but these are not of interest to CMS users.
This user guide describes the available services and provides information
to users on how to use them properly. If you have questions about the use of
the ACCRE not addressed in the userguide please refer to the
user support pages at ACCRE, and if your problems are not addressed there, submit a
help request ticket
To obtain an account at ACCRE ... email ??? ... then go to the
acccount request page at ACCRE.
Interactive use
Interactive access to the cluster is by ssh to
vmplogin.accre.vanderbilt.edu. Authentication is implemented
using a username/password combination. This will log you on to one of the approximately 20 'gateway nodes'. Each user has a homedirectory that is mounted on all machines, but see below
for the different
types of storage and their intended uses.
Users can build and debug software,
run interactive ROOT sessions and run short (15min) jobs directly on the gateway nodes. For larger
collections of jobs or longer jobs please use the
Moab Scheduler System to run jobs on the compute nodes.
bash is the default shell for users but csh is also available.
64bit vs. 32bit
All gateway and compute nodes consist of 64 bit processors and these will run both 64 bit and 32 bit software.
Currently all CMS software is still running in 32 bit.
Data Storage
Three main types of storage are provided by ACCRE that are of interest . Please keep a careful eye
on your use of the different storage areas as each has its own benefits and limitations.
Home Directories
The home directories are based on a high quality (gpfs) storage system. The data is backed up nightly.
Limit your use to ten gigabytes unless you have received explicit notification
of a higher allocation.
GPFS Scratch space
GPFS based scratch space is provided for up to a one hundred gigabytes of temporary storage under the directory
/scratch. To use this space, create a subdirectory within this directory and put your files into it. You may keep up to ten gigabytes indefinitely, and 'burst' up to one hundred gigabytes for up to two weeks.
L-Store
L-Store is to be our large data storage system. Integration with CMSSW is in progress, and once compete it will be a large scale repository for all CMS heavy ion data.
--++ Setting up CMSSW at ACCRE
Setting up the environment for CMSSW at ACCRE is a similar procedure to that at fermilab or CERN. After logging in, run
export SCRAM_ARCH=slc4_ia32_gcc345
source /gpfs1/grid/grid-app/cmssoft/cms/cmsset_default.sh
you may also append these commands to your
.bashrc to save time.
Everything else is exactly as described in the
CMS Offline workbook.
You may run into authentication process with cvs. If you do, simply execute the following command:
export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW
#Moab
Moab Scheduling System
General Framework and Basic Commands
The scheduler at ACCRE is best described by the
official introduction to the compute cluster at ACCRE. Below is simply an abbreviated introduction for CMS users.
Jobs are submitted to the cluster by submitting a shell script to the queue with the
qsub command. This shell script contains the commands to be run for the job as well as special commands designating the resources needed, such as the number of processors, the memory required, and the time required. Jobs that go over their specified memory or wall-clock time allowance are killed, so it is good to overestimate the time and memory needed. On the other hand, jobs requiring more resources may sit idly in the queue longer.
When the requested resources become available, ones job is run. One can see the status of the queue, showing all of the jobs being run and waiting to be run with the
showq command. Each submitted job is given a
JOBID number. You can get a detailed report on the status of a job by using
checkjob JOBID.
PBS Scripts
The shell script that is submitted to the cluster is called a PBS script. Below is an example of a pbs script that may serve as a good template for a CMSSW job.
#!/bin/bash
#PBS -l nodes=1:x86:nomyrinet
#PBS -l mem=4000mb
#PBS -l walltime=8:00:00
#PBS -j oe
cd /home/username/my_working_directory
export SCRAM_ARCH=slc4_ia32_gcc345
source /gpfs1/grid/grid-app/cmssoft/cms/cmsset_default.sh
eval `scramv1 runtime -sh`
cmsRun my_cmssw_script_cfg.py
exit 0
One may save this script as
myscript.pbs and submit it to the queue by typing
qsub myscript.pbs. The meaning of each part of the script is given below:
#PBS -l nodes=1:x86:nomyrinet
#PBS -l mem=4000mb
#PBS -l walltime=8:00:00
These three lines indicate the resources that we wish to be allocated to us. As CMSSW uses no interprocess communication, only one node is requested. The
x86 means that one wishes to run on an opteron core rather than powerpc. The
nomyrinet command means that one is requesting nodes that are connected by a slower ethernet system, which is preferable as in CMSSW one uses no node-to-node communication and requesting the faster networking will leave ones job idling in the queue longer.
In general, one only needs to change the
mem and
walltime lines for a CMSSW job. Walltime is given in hours:minutes:seconds. Some cluster nodes have 2 GB of memory per core or less, so requesting a smaller amount of memory will ensure that your job runs faster.
#PBS -j oe
This line ensures that the
stdout and
stderr of the job will be made available after the job runs. After running
qsub myscript.pbs, one will see a response such as
834576.vmpsched. This means that your job has a JOBID of 834576, and the output will be returned in a file called
myscript.pbs.o834576. This file is useful for debugging.
cd /home/username/my_working_directory
This forces execution of the script to occur in a specified directory.
export SCRAM_ARCH=slc4_ia32_gcc345
source /gpfs1/grid/grid-app/cmssoft/cms/cmsset_default.sh
eval `scramv1 runtime -sh`
cmsRun my_cmssw_script_cfg.py
Here we need to set up the environment in the script as the job may not read ones
.bashrc. The cmsRun command is used in the normal manner.
--
EricAppelt - 02 Jun 2009