1363031171_systems

HPC and storage hardware

The current Norwegian academic e-infrastructure consists of four HPC systems and two storage systems.

HPC systems

Each of the HPC facilities consists of a compute resource (a number of compute nodes each with a number of processors and internal shared-memory, plus an interconnect that connects the nodes), a central storage resource that is accessible by all the nodes, and a secondary storage resource for back-up (and in few cases also for archiving). All facilities use variants of the UNIX operating system (Linux, AIX, etc.).

Storage systems

The NIRD storage provides data storage capacity for research projects with a data centric architecture. Hence it is  also used  for storage  capacity  for the HPC systems, for  the  national data archive  and  other services  requiring a storage backend. It consists of two geographically separated storage systems. The leading storage technology combined with the powerful network backbone underneath allow the two systems to be geo‐replicated in an asynchronous fashion, thus ensuring high availability and security of the data.  

The NIRD storage facility, differently from its predecessor NorStore, is strongly integrated with the HPC‐ systems, thus facilitating the computation on large datasets. Furthermore, the NIRD storage offers high performance  storage  for  post  processing,  visualization,  GPU‐computing  and  other  services  on  the  NIRD  Service Platform 

Technical documentation

Systems in production

Betzy

The supercomputer is named after Mary Ann Elizabeth (Betzy) Stephansen, the first Norwegian woman with a PhD in mathematics.

The most powerful supercomputer in Norway

Betzy is a BullSequana XH2000, provided by Atos, and gives Norwegian researchers more than 5 times more capacity than previously, with a theoretical peak performance of 6.2 PetaFlops. The supercomputer is placed at NTNU in Trondheim and has been in production since 24 November 2020.

Technical specifications:

  • The system comprises of 1344 compute nodes each equipped with 2 x 64core AMD EPYC™ processors, code name ‘Rome’, for a total of 172032 cores installed on a total footprint of only 14.78m2. The total compute power is close to 6 Pflops.
  • The system consumes 952kW of power and is 95% liquid cooled.
  • The computes nodes are interconnected with Mellanox HDR technology.
  • The data management solution relies on a DDN storage with a Lustre parallel file system of 2.6PB.
     
System BullSequana XH2000
Max Floating point performance, double 6.2 Petaflops
Number of nodes 1344
CPU type AMD® Epyc™ "Rome" 2.25GHz
CPU cores in total 172032
CPU cores per node 128
Memory in total 336 TiB
Memory per node 256 GiB
Total disc capacity 2.6 PB
Interconnect InfiniBand HDR 100, Dragonfly+ topology

 

Figure Betzy

Fram

Named after the Norwegian arctic expedition ship Fram, the new Linux cluster hosted at UiT is a shared resource for research computing capable of 1.1 PFLOP/s theoretical peak performance. It started production 1 November 2017 (2017.2 computing period).

Fram is a distributed memory system which consists of 1004 dual socket and 2 quad socket nodes, interconnected with a high-bandwidth low-latency Infiniband network. The interconnect network is organized in an island topology, with 9216 cores in each island. Each standard compute node has two 16-core Intel Broadwell chips (2.1 GHz) and 64 GiB memory. In addition, 8 larger memory nodes with 512 GiB RAM and 2 huge memory quad socket nodes with 6 TiB of memory is provided. The total number of compute cores is 32256. The system consumes 300 kW of power. 

Technical details

System Lenovo NeXtScale nx360
Number of Cores 32256
Number of nodes 1006
CPU type Intel E5-2683v4 2.1 GHz
Intel E7-4850v4 2.1 GHz (hugemem)
Max Floating point performance, double 1.1 Petaflop/s
Total memory 78 TiB
Total disc capacity 2.5 PB

 

Figure Fram

Saga

The supercomputer is named after the goddess in norse mythology associated with wisdom. Saga is also a term for the Icelandic epic prose literature. The supercomputer, placed at NTNU in Trondheim, is designed to run workloads from Abel and Stallo. It was made available to users right before the start of 2019.2.

Saga is provided by Hewlett Packard Enterprise and has a computational capacity of approximately 85 million CPU hours a year and a life expectancy of four yearuntil 2023. The system consumes 100 kW of power.

The central BeeGFS storage system was expanded from 1 PB to 5.9 PB in february 2021.


Technical details

Main components

  • 200 standard compute nodes, with 40 cores and 192 GiB memory each
  • 28 medium memory compute nodes, with 40 cores and 384 GiB of memory each
  • 8 big memory nodes, with 3 TiB and 64 cores each
  • 8 GPU nodes, with 4 NVIDIA GPUs and 2 CPUs with 24 cores and 384 GiB memory each
  • 8 login and service nodes with 256 cores in total
  • 5.9 PB high metadata performance BeeGFS scratch file system

Key figures

Processor Cores 10080
GPU units 32
Internal Memory 75 TiB
Internal disk 91 TB NVMe
Central disk 5.9 PB
Theoretical Performance (Rpeak) 645 TFLOPS

 

Figure Saga

NIRD Service Platform

The NIRD Service Platform is a Kubernetes based infrastructure, to run several types of services and software, such as web-services, domain and community specific portals, tools for data visualization, data discovery and data sharing. The clusters are located in Tromsø and Trondheim.

Technical details

Workers 16 over two sites
vCores 1152, 8 workers with 64 and 8 workers with 80
RAM 5 TiB, 8 workers with 256GiB and 8 workers with 768GiB
Gbps 40 Gbps network interconnect to storage and among workers
Storage Total NIRD storage capacity accessible form the platform
GPUs 32 NVIDIA V100 GPUs, 8 workers with 2 and 4 workers with 4

 

Figure NIRD SP

 

NIRD Service Platform description

Decommissioned systems

Gardar (2012 - 2015)

Gardar is an HP BladeCenter cluster consisting of one frontend (management, head) node, 2 login nodes and 288 compute nodes running Centos Linux managed by Rocks. Each node contains two Intel Xeon Processors with 24GB memory. The compute nodes are located in HP racks. Each HP rack contains three c7000 Blade enclosures where each enclosure contains 16 compute nodes.

Gardar has a separate strorage sysetm. The X9320 Network storage system is available to the entire cluster and uses the IBRIX Fusion software. The total usable storage of X9320 is 71.6TByte. The storage system is connected to the cluster with an Infiniband QDR network.

Technical details

System HP Bl280cG6 Servers
Number of cores 3456
Number of nodes 288
CPU type Intel Xeon E5649 (2.53GHz) - Westmere -EP
Number of teraflops 35TFlops
Total storage capacity 71.6TByte
Hexagon (2008 - 2017)

UiB operates supercomputer facilities that serve the high-end computational needs of scientists at Norwegain universities and other national research and industrial organizations.

Significant use of Hexagon has traditionally come from computational chemistry, computational physics, computational biology, the geosceinces and mathematics. The supercomputer machine Hexagon is installed at the High Technological Center in Bergen (HiB) and is managed and operated by UiB. Hexagon was upgraded from Cray XT4 in March 2012.

Technical details

System Cray XE6-200
Number of Cores 22272
Number of nodes 696
Cores per node 32
CPU type Cray Gemini Interconnect
TFlops peak performace 204.9 Teraflops/s
Operative system Cray Linux Environment
Abel (2012 - 2020)

Named after the famous Norwegian mathematician Niels Henrik Abel, the Linux cluster at UiO is a shared resource for research computing capable of 258 TFLOP/s theoretical peak performance. At the time of installation 1 October 2012, Abel reached position 96 on the Top500 list of the most powerfull systems in the world.

Abel is an allround, allpurpose cluster designed to handle multiple concurrent jobs / users with varying requirements. Instead of massively parallell applications, the primary application profile is for moderately to smaller parallel applications with high IO and/or memory demand.

Technical details

                                                                                                                                                                                                                                                                                                                             

System MEGWARE MiriQuid 2600
Number of Cores 10000+
Number of nodes 650+
CPU type Intel E5 2670
Max Floating point performance, double: 258 Teraflops/s
Total memory 40 TebiBytes
Total disc capacity 400 TiB
Stallo (2007 - 2021)

The Linux Cluster Stallo is a compute cluster at UiT - The Arctic University which was installed 1 December 2007, and included in NOTUR 1 January 2008. The supercomputer was upgraded in 2013.

Stallo is intended for a distributed-memory MPI applications with low communication requirements between the processors, a shared-memory OpenMP applications using up to eight processor cores, parallel applications with moderate memory requirements (2-4 GB per core) and embarrassingly parallel applications.

Technical details

System HP BL 460c Gen 8
Number of Cores 14116
Number of nodes 518
CPU type Intel E5 2670
Peak performance 104 Teraflops/s
Total memory 12.8 TB
Total disc capacity 2.1 PB

 

Vilje (2012 - 2021)

Vilje is a cluster system procured by NTNU, in cooperation with the Norwegian Meteorological Institute and UNINETT Sigma in 2012. Vilje is used for numerical weather prediction in operational forecasting by met.no, as well as for research in a broad range of topics at NTNU and other Norwegian universities, colleges and research institutes. The name Vilje is taken from Norse Mythology.

Vilje is a distributed memory system that consists of 1440 nodes interconnected with a high-bandwidth low-latency switch network (FDR Infiniband). Each node has two 8-core Intel Sandy Bridge (2.6 Ghz) and 32 GB memory. The total number of cores is 23040.

The system is well-suited (and intended) for large scale parallel MPI applications. Access to Vilje is in principle only allowed for projects that have parallel applications that use a relatively large number of processors (≥ 128).

Technical details

System SGI Altix 8600
Number of Cores 22464
Number of nodes 1404
CPU type Intel Sandy Bridge
Total memory 44 TB
Total disc capacity  

 

Pictures of national e-infrastructure systems

  • Hexagon
    A supercomputer for massive parallel jobs
  • Stallo
    A supercomputer for smaller low parallel jobs
  • Abel
    A supercomputer for smaller sequential jobs
  • Vilje
    A supercomputer for large and medium parallel jobs
  • NIRD
    A storage facility for large research data sets
  • Saga
    A supercomputer for smaller sequential jobs
  • Betzy
    A supercomputer for massive parallel jobs