The Dell cluster was funded by a New York State Empire State Development grant to provide the WNY and NY State industrial community with access to state-of-the-art high-performance computing resources (hardware, software and consulting services) to help foster economic development. This cluster consists of 99 Dell PowerEdge servers with a total of 5544 processor cores and HDR InfiniBand interconnect. 67 of these nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, 512GB of memory and 960GB of local scratch. Another 16 of the nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, 1024GB of memory and 960GB of local scratch. The final 16 nodes consist of two Intel Ice Lake Xeon Gold 6330 28-core processors, dual Nvidia A100 GPU, 512GB of memory and 960GB of local scratch.
Type of Node | # of Nodes | # Cores/Node | Processor | GPU | RAM | Network | SLURM TAGS | Local /scratch |
Compute | 67 | 56 | Intel Xeon Gold 6330 | - | 512GB | HDR Infiniband | CPU-Gold-6330 | 960GB |
Large Memory | 16 | 56 | Intel Xeon Gold 6330 | - | 1024GB | HDR Infiniband | CPU-Gold-6330 | 960GB |
GPU Compute | 16 | 56 | Intel Xeon Gold 6330 | Dual Nvidia A100 | 512GB | HDR Infiniband | CPU-Gold-6330-A100 | 960GB |
Sample SLURM Directives |
To request GPUs: |
--nodes=1 --gres=gpu:1 (or gpu:2) |
--nodes=2 --gres=gpu:1 (or gpu:2) |
To use all cores on a node w/more than 1 GPU you must disable CPU binding |
--gres-flags=disable-binding |
The industry cluster is broken up into several partitions that users can request to have their jobs run on. The "industry" partition is only available to industry partners. UB faculty and students have the option of running their jobs in the "scavenger" partition. This allows jobs to run when there are no other pending jobs in the compute partition. Once an industry user submits a job requesting resources, jobs in the scavenger partition are stopped and requeued. Please contact us if you'd like to test your jobs using this preemption feature of the job scheduler. Note that your jobs MUST be able to checkpoint. Please see our documentation about checkpointing here.
Partition Name | Time Limit | Default Number CPUs | Notes |
industry | 72 hours | 1 | available only to industry partners |
scavenger | 72 hour | 1 | --requeue flag required for jobs to be restarted |