HPC and storage hardware

The current Norwegian e-infrastructure consists of four HPC systems and two storage systems. In addition, Norway have access to the EuroHPC machine LUMI.

HPC systems

Each of the HPC facilities consists of a compute resource (several compute nodes each with several processors and internal shared memory, plus an interconnect that connects the nodes), a central storage resource that is accessible by all the nodes, and a secondary storage resource for back-up (and in few cases also for archiving). All facilities use variants of the UNIX operating system (Linux, AIX, etc.).

Storage systems

The NIRD  provides data storage capacity for research projects with a data-centric architecture. Hence it is also used for storage capacity for the HPC systems, for the national data archive and other services requiring a storage backend. It consists of two geographically separated storage systems. The leading storage technology combined with the powerful network backbone underneath allows the two systems to be geo‐replicated in an asynchronous fashion, thus ensuring high availability and security of the data.  

The NIRD storage facility, differently from its predecessor NorStore, is strongly integrated with the HPC‐ systems, thus facilitating the computation of large datasets. Furthermore, the NIRD storage offers high-performance storage for post-processing,  visualization, GPU‐computing and other services on the NIRD Service Platform.

Technical documentation

Systems in production

The supercomputer is named after Mary Ann Elizabeth (Betzy) Stephansen, the first Norwegian woman with a PhD in mathematics.

The most powerful supercomputer in Norway

Betzy is a BullSequana XH2000, provided by Atos, and gives Norwegian researchers more than 5 times more capacity than previously, with a theoretical peak performance of 6.2 PetaFlops. The supercomputer is placed at NTNU in Trondheim and has been in production since 24 November 2020.

Technical specifications:

  • The system comprises of 1344 compute nodes each equipped with 2 x 64core AMD EPYC™ processors, code name ‘Rome’, for a total of 172032 cores installed on a total footprint of only 14.78m2. The total compute power is close to 6 Pflops.
  • The system consumes 952kW of power and is 95% liquid-cooled.
  • The computes nodes are interconnected with Mellanox HDR technology.
  • The data management solution relies on a DDN storage with a Lustre parallel file system of 2.6PB.
     
System BullSequana XH2000
Max Floating point performance, double 6.2 Petaflops
Number of nodes 1344
CPU type AMD® Epyc™ "Rome" 2.25GHz
CPU cores in total 172032
CPU cores per node 128
Memory in total 336 TiB
Memory per node 256 GiB
Total disc capacity 2.6 PB
Interconnect InfiniBand HDR 100, Dragonfly+ topology

 

Figure Betzy

Named after the Norwegian arctic expedition ship Fram, the new Linux cluster hosted at UiT is a shared resource for research computing capable of 1.1 PFLOP/s theoretical peak performance. It started production on 1 November 2017 (2017.2 computing period).

Fram is a distributed memory system that consists of 1004 dual socket and 2 quad socket nodes, interconnected with a high-bandwidth low-latency Infiniband network. The interconnect network is organized in an island topology, with 9216 cores in each island. Each standard compute node has two 16-core Intel Broadwell chips (2.1 GHz) and 64 GiB memory. In addition, 8 larger memory nodes with 512 GiB RAM and 2 huge memory quad-socket nodes with 6 TiB of memory is provided. The total number of compute cores is 32256. The system consumes 300 kW of power. 

Technical details

System Lenovo NeXtScale nx360
Number of Cores 32256
Number of nodes 1006
CPU type Intel E5-2683v4 2.1 GHz
Intel E7-4850v4 2.1 GHz (hugemem)
Max Floating point performance, double 1.1 Petaflop/s
Total memory 78 TiB
Total disc capacity 2.5 PB

 

Figure Fram

The supercomputer is named after the goddess in Norse mythology associated with wisdom. Saga is also a term for Icelandic epic prose literature. The supercomputer, placed at NTNU in Trondheim, is designed to run workloads from Abel and Stallo. It was made available to users right before the start of 2019.2.

Saga is provided by Hewlett Packard Enterprise and has a computational capacity of approximately 85 million CPU hours a year and a life expectancy of four yearsuntil 2023. The system consumes 100 kW of power.

The central BeeGFS storage system was expanded from 1 PB to 6.6 PB in February 2021.


Technical details

Main components

  • 200 standard compute nodes, with 40 cores and 192 GiB memory each
  • 120 standard compute nodes with 52 cores and 192 GiB memory each
  • 28 medium memory compute nodes, with 40 cores and 384 GiB of memory each
  • 8 big memory nodes, with 3 TiB and 64 cores each
  • 8 GPU nodes, with 4 NVIDIA GPUs and 2 CPUs with 24 cores and 384 GiB memory each
  • 8 login and service nodes with 256 cores in total
  • 6.6 PB high metadata performance BeeGFS scratch file system

Key figures

Processor Cores 16064
GPU units 32
Internal Memory 75 TiB
Internal disk 97.5 TB NVMe
Central disk 6.6 PB
Theoretical Performance (Rpeak) 645 TFLOPS

 

Figure Saga

The NIRD Service Platform is a Kubernetes based infrastructure, to run several types of services and software, such as web services, domain and community-specific portals, tools for data visualization, data discovery and data sharing. The clusters are located in Tromsø and Trondheim.

Technical details

Workers 16 over two sites
vCores 1152, 8 workers with 64 and 8 workers with 80
RAM 5 TiB, 8 workers with 256GiB and 8 workers with 768GiB
Gbps 40 Gbps network interconnect to storage and among workers
Storage Total NIRD storage capacity accessible from the platform
GPUs 32 NVIDIA V100 GPUs, 8 workers with 2 and 4 workers with 4

 

Figure NIRD SP

 

NIRD Service Platform description

LUMI (Large Unified Modern Infrastructure) is the first of three pre-exascale supercomputers built to ensure that Europe is among the world leaders in computing capacity. Norway, through Sigma2, owns part of the LUMI supercomputer which is funded by the EU and consortium countries.

Key figures - Sigma2 share

CPU-core-hours 34 003 333
GPU-hours 1 771 000
TB-hours 16 862 500
Central disk A share of the total 117 PB
Theoretical Performance (Rpeak) ~11 PFLOPS

The figures above roughly corresponds to the physical machine illustrated by the figure below.

Sigma2s LUMI share

Using one core of LUMI-C for one hour costs one CPU-core-hour, and using one GPU of the LUMI-G partition for one hour costs one GPU-hour. Storing 1 terabyte on LUMI-F consumes 10 terabyte-hours in one hour, on LUMI-P 1 terabyte-hour per hour, and on LUMI-O 0.5 terabyte-hours per hour.

More details about LUMI
LUMI documentation
 

Decommissioned systems

Gardar is an HP BladeCenter cluster consisting of one frontend (management, head) node, 2 login nodes and 288 compute nodes running Centos Linux managed by Rocks. Each node contains two Intel Xeon Processors with 24GB memory. The compute nodes are located in HP racks. Each HP rack contains three c7000 Blade enclosures and each enclosure contains 16 compute nodes.

Gardar has a separate storage system. The X9320 Network storage system is available to the entire cluster and uses the IBRIX Fusion software. The total usable storage of X9320 is 71.6TByte. The storage system is connected to the cluster with an Infiniband QDR network.

Technical details

System HP Bl280cG6 Servers
Number of cores 3456
Number of nodes 288
CPU type Intel Xeon E5649 (2.53GHz) - Westmere -EP
Number of teraflops 35TFlops
Total storage capacity 71.6TByte

UiB operates supercomputer facilities that serve the high-end computational needs of scientists at Norwegian universities and other national research and industrial organizations.

Significant use of Hexagon has traditionally come from computational chemistry, computational physics, computational biology, geosciences and mathematics. The supercomputer machine Hexagon is installed at the High Technological Center in Bergen (HiB) and is managed and operated by UiB. Hexagon was upgraded from Cray XT4 in March 2012.

Technical details

System Cray XE6-200
Number of Cores 22272
Number of nodes 696
Cores per node 32
CPU type Cray Gemini Interconnect
TFlops peak performance 204.9 Teraflops/s
Operative system Cray Linux Environment

Named after the famous Norwegian mathematician Niels Henrik Abel, the Linux cluster at UiO is a shared resource for research computing capable of 258 TFLOP/s theoretical peak performance. At the time of installation on 1 October 2012, Abel reached position 96 on the Top500 list of the most powerful systems in the world.

Abel is an all-around, all-purpose cluster designed to handle multiple concurrent jobs/users with varying requirements. Instead of massively parallel applications, the primary application profile is for moderately to smaller parallel applications with high IO and/or memory demand.

Technical details

System MEGWARE MiriQuid 2600
Number of Cores 10000+
Number of nodes 650+
CPU type Intel E5 2670
Max Floating-point performance, double: 258 Teraflops/s
Total memory 40 TebiBytes
Total disc capacity 400 TiB

The Linux Cluster Stallo is a compute cluster at UiT - The Arctic University which was installed on 1 December 2007, and included in NOTUR on 1 January 2008. The supercomputer was upgraded in 2013.

Stallo is intended for a distributed-memory MPI application with low communication requirements between the processors, a shared-memory OpenMP application using up to eight processor cores, and parallel applications with moderate memory requirements (2-4 GB per core) and embarrassingly parallel applications.

Technical details

System HP BL 460c Gen 8
Number of Cores 14116
Number of nodes 518
CPU type Intel E5 2670
Peak performance 104 Teraflops/s
Total memory 12.8 TB
Total disc capacity 2.1 PB

 

Vilje is a cluster system procured by NTNU, in cooperation with the Norwegian Meteorological Institute and UNINETT Sigma in 2012. Vilje is used for numerical weather prediction in operational forecasting by met.no, as well as for research in a broad range of topics at NTNU and other Norwegian universities, colleges and research institutes. The name Vilje is taken from Norse Mythology.

Vilje is a distributed memory system that consists of 1440 nodes interconnected with a high-bandwidth low-latency switch network (FDR Infiniband). Each node has two 8-core Intel Sandy Bridge (2.6 Ghz) and 32 GB memory. The total number of cores is 23040.

The system is well-suited (and intended) for large scale parallel MPI applications. Access to Vilje is in principle only allowed for projects that have parallel applications that use a relatively large number of processors (≥ 128).

Technical details

System SGI Altix 8600
Number of Cores 22464
Number of nodes 1404
CPU type Intel Sandy Bridge
Total memory 44 TB
Total disc capacity  

 

Betzy and Saga are located at NTNU, together with one of the two NIRD storage clusters. The other is placed at UiT with Fram. LUMI is located in a data centre facility in Kajaani, Finland.

Sigma2´s future systems will be placed in the Lefdal Mine Datacenter. As the first of the national systems, the next generation NIRD was installed there during spring 2022 and opens for researchers later this year.