Industry Compute Cluster

CCR General Compute Cluster.

New Cluster Now Available!

The Dell cluster was funded by a New York State Empire State Development grant to provide the WNY and NY State industrial community with access to state-of-the-art high-performance computing resources (hardware, software and consulting services) to help foster economic development.  This cluster consists of 99 Dell PowerEdge servers with a total of 5544 processor cores and HDR InfiniBand interconnect. 67 of these nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, 512GB of memory and 960GB of local scratch.  Another 16 of the nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, 1024GB of memory and 960GB of local scratch. The final 16 nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, dual Nvidia A100 GPU, 512GB of memory and 960GB of local scratch.

Server (Node) Types:

Type of Node

# of Nodes

# Cores/Node

Processor

GPU

RAM

Network

SLURM TAGS

Local /scratch

Compute

67

56

Intel Xeon Gold 6330

-

512GB

HDR Infiniband

CPU-Gold-6330

960GB

Large Memory
Compute

16

56

Intel Xeon Gold 6330

-

1024GB

HDR Infiniband

CPU-Gold-6330

960GB

GPU Compute

16

56

Intel Xeon Gold 6330

Dual Nvidia A100

512GB

HDR Infiniband

CPU-Gold-6330-A100

960GB

Requesting Specific Node Resources:

Sample SLURM Directives
To request GPUs:
--nodes=1 --gres=gpu:1 (or gpu:2)
--nodes=2 --gres=gpu:1 (or gpu:2)
 
To use all cores on a node w/more than 1 GPU you must disable CPU binding
--gres-flags=disable-binding

Partitions Available:

The industry cluster is broken up into several partitions that users can request to have their jobs run on.  The "industry" partition is only available to industry partners.  UB faculty and students have the option of running their jobs in the "scavenger" partition.  This allows jobs to run when there are no other pending jobs in the compute partition.  Once an industry user submits a job requesting resources, jobs in the scavenger partition are stopped and requeued.  Please contact us if you'd like to test your jobs using this preemption feature of the job scheduler.  Note that your jobs MUST be able to checkpoint.  Please see our documentation about checkpointing here.

Partition Name Time Limit Default Number CPUs Notes
industry
72 hours 1  available only to industry partners
scavenger 72 hour 1 --requeue flag required for jobs to be restarted