Procurement project HPC A2

The next national HPC and AI/ML Platform

The content on this web page is subject to change. Last updated 21 March 2023.

This project's objective is to acquire and put into operation the next-generation national HPC resource (supercomputer) in the Sigma2 e-infrastructure portfolio. The working name for this resource is A2. A2 will be placed in Lefdal Mine Datacenter, the same location as the new national storage infrastructure (NIRD).

Figure that shows the location of national e-infrastructure systems in Norway.

Figure 1: Current Norwegian HPC and Data Storage infrastructure, including sensitive data (TSD) in Oslo. The infrastructure in Tromsø (HPC and storage) will be phased out, and service will be provided from Trondheim (Saga and Betzy), together with the new A2 system in Måløy (Lefdal Mine Datacenter).

Goals

The project aims to:

  • replace current HPC machines Fram and Saga and additional usage growth forecast.
  • provide computing capability for AI/ML and scientific applications through GPU/CPU. 
  • procure a system with expandable computing and storage capacity.

Request For Information (RFI)

Aftermarket exploration and defining Sigma2’s needs for the next-generation HPC machine, the project has now published a request for information (RFI). The RFI aims to gather input and information from potential suppliers regarding crucial parts of the procurement documents. The documents are considered drafts, and the final versions are likely to be similar to the drafts unless feedback from the RFI process indicates that changes should be made. 
Sigma2 encourages potential suppliers to use this opportunity to provide feedback, as the possibility to make changes after the invitation to tender has been announced is limited in accordance with the public procurement regulations.

If you wish to contribute to the RFI, please submit your response using the Tendsign tendering tool. Please follow the link below to the publication of the HPC A2 RFI:

Tenders Electronic Daily (TED) HPC A2 RFI

Preliminary timeline

Q1 2023
Publication of RFI

The RFI is published.

H1 2023
Publication of tender

We expect to publish the tender in H1.

H2 2024
New HPC system in production

Our goal is to have the system in production in H2 2024.

Background

HPC systems generally have a life span of approximately 5 years due to declining energy efficiency compared to newer machines, obsolete parts, lack of support, and the arrival of new technology.

Our procurement strategy has traditionally been two-legged, an A- and B-leg where we acquire systems with an offset of 2-3 years.

Current Sigma2 hardware and capability

Fram
What Specifications
System  Lenovo NeXtScale nx360
Nodes 1006
Cores 32256
CPU types

Intel E5-2683v4 2.1 GHz

Intel E7-4850v4 2.1 GHz (hugemem)

Local disk SSD
Performance 1.1 PF
Total memory 78 TiB
Disk size and type 2.5 PB, Lustre
Interconnect type and topology Infiniband EDR (100Gbit), Fat Tree
Queueing system Slurm
Cooling Water cooled
Saga
What Specifications
System

HPE XL170r Gen10

ProLiant XL270d Gen10 (GPU node)

Nodes 356 (+8 GPU nodes)
Cores 16064
CPU types

200 Intel 6138 2.00GHz

120 Intel 6230R 2.10GHz

8 Intel 6126 CPU @ 2.60GHz (GPU nodes)

Local disk NVMe
Performance 645TF
Total memory 75 TiB
Disk size and type 6.6 PB, BeeGFS
Interconnect type and topology Infiniband FDR (56Gbit), Fat Tree

Queueing system 

Slurm

Cooling 

Air cooled 

Betzy
What Specifications
System

BullSequana XH2000 

Nodes

1344 CPU nodes

4 GPU nodes

Cores 172032
CPU types

CPU nodes: AMD® Epyc™ "Rome" 2.25GHz

GPU nodes: 4 GPU nodes, with 4 NVIDIA A100 with nvlink GPUs and AMD Milan CPUs

Local disk No local disk
Performance 6.5 PF 
Total memory 336 TiB
Disk size and type 7.7 PB, Lustre
Queueing system  Slurm
Interconnect type and topology Infiniband HDR (100 Gbit per node, 200 Gbit switches)), Dragonfly

Cooling 

Water cooled

The NIRD storage is based on IBM Elastic storage, ESS. The current capacity of 35 PB is shared between file and object storage. This is designed for future growth.   

LUMI
What Specifications
System  HPC CRAY Shasta
Nodes 1536 (LUMI-C) 2560 (LUMI-G)
Cores 196 608 (LUMI-C)
CPU types AMD EPYC
GPU types AMD MI-250X
Local disk 80 PB (LUMI-P) 7 PB (LUMI-F) 30 PB (LUMI-O)
Peak performance (Pflops/s) 552 (LUMI-G), 8 (LUMI-C)
Total memory 440 TiB
Disk size and type Spinning LUMI-P, flash LUMI-F
Queueing system Slurm
Interconnect type and topology Slingshot
Cooling  Water cooled