CMSAF Userguide

Introduction

The MIT CMS Analysis facility (CMSAF) provides interactive and Grid based services to CMS and CMS physicists. It consists of a number of clusters intergrated into a common management infrastructure including shared accounts. The three main parts are the CMS MIT Tier-2 cluster ("T2BAT"), the CMS-HI Tier-3 cluster ("HIBAT") and the MIT CDF cluster ("CDBAT"). The Grid interfaces are integrated into the Open Science Grid (OSG).

This user guide describes the available services and provides information to users on how to use them properly. If you have questions about the use of the CMSAF not addressed in the userguide please send email to support@cmsaf.mit.edu. However, if you've never used support before, your email address probably is not in the whitelist of the CMS Request Tracker. In this case, please contact Mike Tiernan via mtiernan@MIT.EDU so that he can add you to the allowed list.

General inquires about the overall status of the center can be directed to cms-t2@MIT.EDU

Connection

People should ssh to portal.cmsaf.mit.edu to connect to MIT. Preferably use nomachine.com (or NX) client but ssh is also OK.

Authentication is implemented using Kerberos. Each user has a home directory that is mounted on all machines, but see below for the different types of storage and their intended uses.

Running Interactive jobs, Including ROOT, and submitting Condor:

Once you connect you MUST jump to some other machine to do interactive work. Never run anything on hissh0002 or cgate.

For CMS Heavy Ions:

Jump to hidsk0001 (new machine) or hibat0002-hibat0004 (older machines). Do not run interactive jobs on ANY other computers. For larger collections of jobs please use the Condor Batch System, submitting from the hibat machines.

Data Storage

Three main types of storage are provided in the CMSAF. Please keep a careful eye on your use of the different storage areas as each has its own benefits and limitations.

Do not use /tmp as private scratch space. The system relies on having space available and data will be removed as needed without notice .

Home Directories

The home directories are based on a high quality storage system. The data is backed up nightly. Limit your use to two gigabytes unless you have received explicit notification by a Tier2 administrator of a higher allocation. Do not store event data files in the home areas. Use extremely sparingly, only for things that absolutely need to be backed up. Home is small and slow. Most certainly do not have anything in your Condor jobs that read directly from your home, never ever.

NFS Scratch space

NFS based scratch space is provided for up to a few hundred gigabytes of storage. This space is not to be used for large event samples as the performance of the NFS server is insufficient to support large numbers of jobs.

CMS Heavy Ions:

Create and move your scratch space to /net/hidsk0001/d00/scratch/your_name.

Alternately, use /net/hibat000x/d00/scratch/your_name where x is 2-4 (but dsk0001 is much larger and stronger)

Hadoop

Hadoop is our large data storage system. It should be used to store data samples larger then a few Gigabytes.

Store your files in

/mnt/hadoop/cms/store/user/YourName

Grid essentials

Although some pages recommend Firefox, many people also use Safari.

Basics

General information on the grid can be found here http://www.uscms.org/uscms_at_work/physics/computing/grid/index.shtml as well as other places. The Getting Started guide has a link to the Certificate page as well as a recent (July 2015) set of screenshots using Safari.

Getting certificate

http://www.uscms.org/SoftwareComputing/Grid/GettingStarted/certificate_procedure.html

Just follow the instructions on that page. the certificate will be downloaded and you can use Keychain to import it.

You will have to contact a local person at MIT who will want to talk to you. Now it is Max Goncharov (maxi@MIT.EDU)

Registering in VO (Virtual organization, this is CMS for us)

https://lcg-voms.cern.ch:8443/vo/cms/vomrs?path=/RootNode

Changing/Adding a group role for VO

E-mail Andrea Sciaba (andrea.sciaba@cern.ch), describing your situation.

Condor Batch System

You can find more than you would ever want to know about Condor using man or info for condor_submit or at the main Condor web page. The Users' Guide on the Condor site is probably the most useful.

Important General remarks

(2010: uncertain if this is still accurate for condor machine names.)

The condor system on one server at MIT is not able to handle too many jobs at one time. Because of this, it is crucial that one uses different servers to share the load on each. The possible servers are:

  • hisrv0001

  • hissh0001

  • hissh0002

You can simply log into these servers via ssh when you log into cgate:

ssh hissh0001

It is recommended that you do

condor_q -g

to observe the load on each server, and choose the one with the least jobs to submit your condor jobs.

  • If you have submitted too many jobs that the server starts to fail, your jobs can be killed without notice.

It is also important that you only use local condor submission when you want to analyze unofficial data on the hibat nodes, or if you need to write on the hibat nodes, but for no other purpose! You can use CRAB to analyze official data.

Local Submission

  • This submission is allowed only if you are working on your own private data or just hacking around. For ALL jobs involving official CMS data or simulations, please use grid submission (see below).
  • An example script (with some possibly helpful comments) which creates a file and then submits it to Condor is here.
  • Note: Please make sure that you run your jobs in the local directory ${_CONDOR_SCRATCH_DIR} assigned to your job by condor. If you like to have the same script working with or without condor it is good practise to set, create and delete it manually. For bash, this could look like:
if test -z "${_CONDOR_SCRATCH_DIR}"; then
    export _CONDOR_SCRATCH_DIR=/tmp/$USERNAME/mytmpdir/
fi
mkdir -p ${_CONDOR_SCRATCH_DIR}
... #do something here
rm -rf ${_CONDOR_SCRATCH_DIR}

Grid Submission

Use CRAB with official data please.

Below are for those who want to test and debug further problems with condor.

For users:

DO NOT USE THE INFORMATION BELOW. DO NOT USE THIS KIND OF SUBMISSION.

If you really want to use the local condor through grid in order to read the unofficial data, then you can try the following, which does not work well and results in some problems:

  • Example Script
  • Initialize grid environment:
    • first do source /osg/grid/setup.sh (or .csh in c-shells)
    • then grid-proxy-init to initialize the environment
    • Try globusrun -a -r ce01.cmsaf.mit.edu. If everything works fine, you
should get GRAM Authentication test successful.

  • Key points (mentioned already in the script):
    • Make sure you specify the correct file for your grid proxy (modify
x509userproxy=/tmp/x509up_u1624
)
    • If you want any output of your job to be transfered back to the initial
directory you submitted your job from, you need to specify those files in transfer_output_files parameter.
    • Do not have "GetEnv=True". It works for local submission but not for
grid jobs. Bring all the necessary files along with your grid jobs.
    • Normally, your grid jobs will run under user account "cmsuxxxx". If you
want your grid jobs to be able to write directly to dCache (e.g. via dccp), directories that you own, please send a request to Wei Li (davidlw@MIT.EDU) so that your jobs will be mapped to run under your own user account. However, this is not recommended since not everyone using MIT Tier-2 has a local account.

  • Quick info for those who already have scripts for local submission

    • The lines to remove from your "condor configuration" file:

Universe       = Vanilla

    • The lines to add in your "condor configuration" file:

Universe       = grid
Grid_resource  = gt2 ce01.cmsaf.mit.edu/jobmanager-condor
x509userproxy = /tmp/x509up_u1624 (correct for your proxy, you can check by
grid-proxy-info -all)
transfer_output_files = output.tgz (output files you want to be transfered back)

Installing UBUNTU

All you need in order to install linux on your computer and run CMSSW on it!

Software Installations

The following set of instructions will setup various software packages typically needed in our farm. However, it does not explain how to run them.

  • For c-shell, change .sh to .csh and change "export name=" to "setenv name [blank space]"

ROOT

source /opt/bin/sh/setroot.sh %version%
where %version% currently can be 5-18-00, 5-16-00 and head (which only exists on i386). To see what versions are installed, do ls /app/root/i386 or ls /app/root/x8664. We strongly recommend always to use the latest install ROOT version (at the moment, 5-18-00).

HIROOT

source /opt/bin/sh/sethirootpath.sh %path%
where %path% should point to the path of your HIROOT copy (eg. ~/hiroot).

CMSSW

SL5 (standard)

We have CMSSW installations in two separate places: one is the official grid installations, which is updated by the grid people, and the other is our local installations, which is useful when we want a version that the grid people haven't installed at MIT yet.

To setup for using the CMSSW from grid installation do:

$ export SCRAM_ARCH=slc5_amd64_gcc434
$ source /osg/app/cmssoft/cms/cmsset_default.sh

SL4 (outdated)

To setup for using the CMSSW from grid installation do:

$ export SCRAM_ARCH=slc4_ia32_gcc345
$ source /osg/app/cmssoft/cms/cmsset_default.sh

To setup for using local CMSSW installations:

$ export SCRAM_ARCH=slc4_ia32_gcc345
$ source /app/cms-soft/cmsset_default.sh

Please note that for SL4 you can compile on any machine and use the code on 32 or 64bit.

SL3 (outdated)

To setup for using the CMSSW software do:
$ export SCRAM_ARCH=slc3_ia32_gcc323
$ source /app/cms/cmsset_default.sh

or

% setenv SCRAM_ARCH slc3_ia32_gcc323
% source /app/cms/cmsset_default.csh
if you are using c-shell.

Please see 64bit vs. 32bit above when you need to compile/build software.

Project area

You can then use scramv1 in the standard way to manage a project area. In particular, doing scramv1 list | grep CMSSW will allow you to see the list of installed CMSSW versions.

CVS

CMS uses CVS for its code repository. In order to both check out from and check in to CVS, you need to use authenticated access. Set the environment variable:

export CVSROOT=:gserver:cmscvs.cern.ch:/cvs_server/repositories/CMSSW
In order to authenticate for the CERN CVS server, get kerberos 5 token as described below.

A web interface to see what is where in CMSSW is at http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi. To check out a particular set of code, you can take the title line on the CVS web page (says "Index of ...") and use:

cvs co directory_name
where directory_name is exactly what appears after "Index of" but omitting the initial back-slash.

When you check out from cvs, the "CVS" subdirectory that gets created automatically contains a pointer to where the code in each directory comes from. So, you can move whole directories around anywhere you want to (just keep the CVS subdirectory along with the code) and checking back in will always go to the right place.

There is a search function here: http://cmslxr.fnal.gov/lxr/

Kerberos ticket

To create a kerberos ticket for checking out from the CERN CMS CVS repository use:
$ kinit -4 -5 -A username@CERN.CH
Note that for most tasks, the -4 and -A options are not required. Also, note that the CERN.CH is case-sensitive, so you need the capital letters.

You can do klist to see what tokens you have.

CRAB

For CRAB on cern machines, see CmsHep tutorial.

For CRAB on cgate:

  • For c-shell, change .sh to .csh and change "export name=" to "setenv name [blank space]"

  • First set the grid environment and get your proxy:
    source /osg/grid/setup.sh
    grid-proxy-init
    

  • Then set CRAB and gLite-UI environment:
    source /osg/app/crab/crab.sh # CRAB
    source /osg/app/glite/etc/profile.d/grid_env.sh # gLite-UI
    

  • Then set your CMSSW environment as described above:
    export SCRAM_ARCH=slc4_ia32_gcc345 # CMSSW
    source /osg/app/cmssoft/cms/cmsset_default.sh #CMSSW
    

  • Now you are ready to submit CRAB jobs. It works exactly the same way as it does on lxplus machine.

Running CRAB on Prerelease

Because CRAB submits jobs through the grid, it only recognizes the OSG installations for CMSSW, which exclude prereleases. In order to run on a prerelease, you need to submit your job with a legitimate OSG version, and have the crab jobs run a modified executable script that sets up environment based on /app installations on the worker node.

The scripts that can do that are in CVS:

UserCode/MitHig/HIProd/crab_prerelease

The instructions are:

  • Edit cfg.py for your job
  • Edit crab.cfg for the target dataset, publication name, and other crab parameters (type crab -help for details)
  • Edit run.sh for -only- the LocalVersion parameter, which will be the main CMSSW area that the job will finally run on.
  • Dump the cfg.py by doing:
./createDump.sh /your/cmssw/version
  • Set up CMSSW environment for another version which is installed in /osg/app. Does not matter which one. This will be a dummy area.
  • You can set up grid, crab, and cmssw at the same time by doing:
./setcrab.sh /your/dummy/osg/area

CERNLIB

Information about the cluster

Basic performance information is available from the cluster front page. From there links are also provided to the Ganglia monitoring, dCache monitoring and Condor status pages which provide much more detail.

Frequently Asked Questions

Frequently asked, or generally useful bits of information

Extract a globus key and certificate pair from a DOE certificate loaded into Mozilla, Firefox etc.

The following steps are distilled from the iVDGL website

  • Use the Export or Backup function of the certificate manager in the browser. (Firefox: Edit->Preferences->Advanced->Security->View Certificates->User Certificates) and select the certificate and save ("backup") as newcert.p12

  • If needed create globus directory
$ mkdir $HOME/.globus

  • Extract the globus certificate
$ openssl pkcs12 -in newcert.p12 -clcerts -nokeys -out $HOME/.globus/usercert.pem

  • Extract the globus private key
$ openssl pkcs12 -in newcert.p12 -nocerts -out $HOME/.globus/userkey.pem

  • Set permissions of the key and certificate to user-read-only
$ chmod 400 $HOME/.globus/userkey.pem
$ chmod 400 $HOME/.globus/usercert.pem

  • Test the new key pair
$ grid-proxy-init

SSH to other servers (64bit) fails

There is a defect with Kerberos and SSH to 64 bit servers. A workaround is to remove ("destroy") the kerberos ticket. Do:
$ kdestroy

The version of CMSSW I was using is no longer available, how do I update?

Use scramv1 p -update to show the available versions, then run scramv1 p -update CMSSW_XXXXX to update the selected version.

Old content of this page: outdated

64bit vs. 32bit (See SL4 note below. You no longer need to worry about this. Should be SL5 now.)

The clusters consist of servers with 32 bit processors and servers with 64 bit processors. All the newer servers are 64 bit and these will run both 64 bit and 32 bit software. Currently all CMS software is still running in 32 bit. For the moment it is necessary to build 32 bit software on a 32 bit node. User can ssh into hissh0002 from cgate to do this. In general the 32 bit machines designated as i386 architecture and the 64 bit machines as x86_64 (See the Condor section for an exception to this)

Topic attachments
I Attachment Action Size Date Who Comment
docrtf exampleCondorGrid.rtf manage 1.8 K 2008-11-10 - 12:57 YetkinYilmaz  
Topic revision: r73 - 2015-08-10 - 17:55:18 - GeorgeStephans
Cms.CmsafUserGuide moved from CmsHi.CmsafUserGuide on 2007-02-21 - 15:48 by ConstantinLoizides - put it back
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback