Industry Compute Partition

New Hardware Now Available!

The compute hardware available to industry partners was updated in summer 2021.  This purchase was funded by a New York State Empire State Development grant to provide the WNY and NY State industrial community with access to state-of-the-art high-performance computing resources (hardware, software and consulting services) to help foster economic development.  This cluster partition consists of 99 Dell PowerEdge servers with a total of 5544 processor cores and HDR InfiniBand interconnect. 67 of these nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, 512GB of memory and 960GB of local scratch.  Another 16 of the nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, 1024GB of memory and 960GB of local scratch. The final 16 nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, dual Nvidia A100 GPU, 512GB of memory and 960GB of local scratch. 

The industry compute nodes are now part of the UB-HPC cluster in the "industry" partition.

Server (Node) Types:

Type of Node

# of Nodes

# Cores/Node

Processor

GPU

RAM

Network

SLURM TAGS

Local /scratch

Compute

67

56

Intel Xeon Gold 6330

-

512GB

HDR Infiniband

CPU-Gold-6330

960GB

Large Memory
Compute

16

56

Intel Xeon Gold 6330

-

1024GB

HDR Infiniband

CPU-Gold-6330

960GB

GPU Compute

16

56

Intel Xeon Gold 6330

Dual Nvidia A100

512GB

HDR Infiniband

CPU-Gold-6330-A100

960GB

Requesting Specific Node Resources:

Sample SLURM Directives
To request GPUs:
--nodes=1 --gres=gpu:1 (or gpu:2)
--nodes=2 --gres=gpu:1 (or gpu:2)
 
To use all cores on a node w/more than 1 GPU you must disable CPU binding
--gres-flags=disable-binding

Partitions Available:

The industry nodes are in the "industry" partition in the UB-HPC cluster.  This partition is only available to industry partners.  UB faculty and students have the option of running their jobs in the "scavenger" partition.  This allows jobs to run when there are no other pending jobs in the industry partition.  Once an industry user submits a job requesting resources, jobs in the scavenger partition are stopped and requeued. 

Note that your jobs MUST be able to checkpoint.  Please see our documentation about checkpointing here.

Partition Name Time Limit Default Number CPUs Notes
industry
72 hours 1  available only to industry partners
scavenger 72 hour 1 --requeue flag required for jobs to be restarted