Changes between Initial Version and Version 1 of ComputeResources/UMCGCluster


Ignore:
Timestamp:
Aug 19, 2011 5:18:26 PM (13 years ago)
Author:
laurent
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ComputeResources/UMCGCluster

    v1 v1  
     1== Description ==
     2The UMCG cluster is composed of:
     3
     4 * 1 head node
     5 * 10 compute nodes each with:
     6   * 48 cores
     7   * 256GB RAM
     8   * 2.3TB of local storage
     9   * 10Gb network connection to storage
     10 * 2 PB storage (only 1.1PB mounted at time of writing)
     11
     12The 10 nodes are dedicated for the GCC group at UMCG. GoNL being the most compute intensive project at GCC, most of the cluster can be used for it. The storage is shared by different groups in Groningen but there is currently no "hard limit" on how much space GoNL can use on the storage; this of course will only work as long as there is sufficient space for everyone.
     13
     14== Access ==
     15Access to the UMCG cluster is done via SFTP (data access only, see [wiki:DataManagement/SftpServer SFTP page] about this) or SSH. There is no public access to the UMCG cluster. Additional personal ssh or sftp accounts can be requested via Morris who keeps the list of all users that have full data access.
     16
     17== Usage ==
     18First of all here are a few '''important''' things to know about the cluster and using it efficiently:
     19
     20 * '''Storage''': The block size on the storage is 6MB, which means that each file -regardless of its real size- will occupy at least 6MB on the file system. This means that data should rather be kept in big files rather than a multitude of small files whenever possible. Typically things like logs, old submit scripts, etc. should be compressed into 1 file for archiving.
     21 * '''I/O''': While 10Gb network connection per node is fast, typical GoNL jobs use large files and consumes lots of I/O. Therefore, I/O should be kept minimal and if a job can be parallelized on multiple cores (i.e. load data once in memory, process it on multiple cores, push it back), it is typically preferred as having separate processes all loading the same data in memory.
     22 * '''Local Storage''': In order to reduce I/O, temporary files (and eventually other heavily used resources) should be stored directly on the local node; the local storage on each node is mounted in /local. Note that:
     23   * Any data on the local storage that you want to keep after the job terminates should be copied to the general storage as the local storage is periodically cleaned and any data that is not in use by currently running job will be deleted.
     24   * Even if the local storage is periodically cleaned, if you store large files on a node while running a job you should clean afterwards. Small temp files are fine.
     25 * '''Data Management''': Please read thoroughly the [wiki:DataManagement Data Management] section of this wiki and respect the structure and conventions described there when using data outside your home directory.
     26
     27== Scheduler ==
     28Application30 uses the [http://doesciencegrid.org/public/pbs/ Portable Bash System (PBS)] scheduling system. You can find the full documentation in this [http://doesciencegrid.org/public/pbs/ PBS guide]. However, here are a few basic commands and tips:
     29
     30 * qstat -u username
     31   * Shows a list of your jobs along with information and status
     32 * showq [-u username]
     33   * Shows the list of all jobs (if you use the -u flag, only the user's jobs) running on the cluster along with information and status.
     34 * checkjob jobid
     35   * Shows in-depth information about a specific job.
     36 * qsub jobScript
     37   * Submit a new job to the cluster. Note that is is important to submit your jobs with the appropriate options; See the qsub flags section below for a quick overview of the common options.
     38 * qdel jobid
     39   * Removes a job from the queue, killing the process if it was already started
     40   * "qdel all" can be used to purge all of your jobs
     41
     42=== qsub options ===
     43Jobs to be submitted via PBS qsub can specify a number of options to claim resources, report status, etc. These options can either be specified in the qsub command or in your job script. The latter is usually preferred as all information about the job including memory requirements, etc. stay with the script, below is an example header with some commonly used options, followed by a list of some commonly used flags and their meaning.
     44
     45'''`Example script header:`'''
     46
     47`-----------------------`
     48
     49`#!/bin/bash #PBS -N JobName#PBS -q gcc#PBS -l nodes=1:ppn=1#PBS -l mem=4gb#PBS -l walltime=12:00:00#PBS -o /target/gpfs2/gcc/home/lfrancioli/output.log#PBS -e /target/gpfs2/gcc/home/lfrancioli/error.log`
     50
     51`#Here comes your bash script commands`
     52
     53`echo "Hello World!"`
     54
     55`----------------------------`
     56
     57'''Commonly used flags:'''
     58
     59 * -q queueName
     60   * Selects which queue the job should be put in. The only queue available at the moment is 'gcc'
     61 * -N jobName
     62   * Set the job name
     63 * -l `nodes=X:ppn=Y`
     64   * `Requests X nodes and Y cores per node`
     65 * `-l mem=Xgb`
     66   * `Requests X GB RAM`
     67 * `-l walltime=12:00:00`
     68   * `Sets the walltime to the specified value (here 12hrs). This flag should be set.`
     69 * `-j oe`
     70   * `Redirects all error output to standard output`
     71 * `-o outputLog`
     72   * `Redirects the standard output to the desired file. Note that using '~' in the path for you home directory does not work.`
     73 * `-e errorLog`
     74   * `Redirects the error output to the desired file. Note that using '~' in the path for you home directory does not work.`