Reaching Others University at Buffalo - The State University of New York
Skip to Content

Batch Computing

Batch processing involves users submitting jobs to a scheduler and resource manager that decides the best and most efficient way to run the jobs while maintaining the highest possible usage of all resources.

 

Testing on the Front-end (login machine)

  • The front-end machine (rush) can be used for tests that run for a few minutes and do not use an extensive amount of memory.
  • The maximum amount of time for running tests on the front end is 30 minutes.

 

Batch System

The batch scheduler used in CCR is the SLURM Workload Manager.

  • SLURM is a workload manager that provides a framework for job queues, allocation of compute nodes, and the start and execution of jobs.
  • SLURM provides scalability and performance.  It can manage and allocate the compute nodes for large clusters.  SLURM can accept up to 1,000 jobs a second.
  • More information about SLURM can be found around our website:
Learn more job priority

What are the benefits of batch processing?

  • It allows sharing of computer resources among many users and programs.
  • It shifts the time of job processing to when the computing resources are less busy.
  • It avoids idling the computing resources with minute-by-minute manual intervention and supervision.
  • By keeping high overall rate of utilization, it better amortizes the cost of a computer, especially an expensive one.

Source - Wikipedia 

Examples of Using the Batch Scheduler

Interactive Jobs:

It is possible to run an interactive job.  This job is still a batch job and must wait for the scheduler to assign nodes.  Once the nodes are available the job logins into the first node.  Note that when you type 'exit' your job will end and you'll be put back on the front end.

Submit an interactive job where 1 node with 8 cores is requested for 1 hour (default partition general-compute): 

[rush:~]$ fisbatch --nodes=1 --ntasks-per-node=8 --time=01:00:00 

 

To submit an interactive job with X-Display, make sure you're forwarding X11 in your SSH client.  Windows users will need to use XWin32 to display X11 sessions.  More details can be found here

 

 

Job Scripts

Sample SLURM scripts

 

 

Using Scratch Space in a Job

  • The local scratch space, that is the scratch disk on a compute node, is available while a job is running. A directory is created in /scratch on each compute node in the job. The variable $SLURMTMPDIR is set to /scratch/$SLURM_JOBID.
    • Users can copy data to and from this scratch space in the SLURM script.
    • All files are removed from the $SLURMTMPDIR directory at the end of the job.
  • The global scratch spaces - /panasas/scratch - may also be used during a job.
  • NOTE: /panasas/scratch is temporary, potentially volatile storage.  It's intended to be used during job run time and then users are expected to remove their files when their jobs are complete.  The storage is not backed up and files older than 3 weeks in age are automatically removed once a day.