Status of procurement project NIRD2020 as of December 2020
After an intensive exploration of the user’s needs, we are now focusing on exploring the market to identify solutions that could best address those needs.
The planning phase of the NIRD2020 project has produced a detailed mapping of the needs of the users with regard to present and future needs for storage and storage services. Starting from this analysis, we have been contacting several storage providers to do an exploration of the market. The storage companies were selected from to the magic quadrant for primary storage of the Gartner evaluation from 2019, according to the user’s requirements.
The first round of market exploration was done in October-December. If necessary, follow-up meetings will be conducted with some of the vendors to deep-dive into some of the specific solutions.
One objective for this status report is to inform all the possible interested vendors about the process and disseminate the same information to everyone, including the vendors who have not been directly contacted.
We make our users, stakeholders and possible vendors aware that we are planning to execute the procurement in the first half of 2021.
Current NIRD architecture
Although the next-generation storage is not necessarily meant to follow the same architectural design, it is worth getting acquainted with the current NIRD architecture and usage trend:
Users are primarily scientists from public research and education institutions in Norway. Data are not homogenous and comes from different users/user groups and are owned by different users/institutions. NIRD is utilized for project storage and related snapshots, backup of selected areas from the HPC’s high-performance file systems, administrative data, and data in the research data archive. The data growth has shown an exponential trend, moving from 2.5PB in 2014 up to 24PB in 2020 (including primary storage and second replica for disaster recovery).
The majority of data on NIRD are very seldomly accessed (approximately 70% has not been accessed during the last 6 months), while 6-10% of the data are accessed and consumed regularly possibly directly on NIRD (through the so called NIRD Service Platform, a Kubernetes cluster for data analytics) or by staging them on the HPC file systems.
Architectural Principles for the future NIRD
- User in Focus - User’s requirements are central in the design of the NIRD2020 architecture
- Performance and Efficiency – Storage fits for the purpose
- Easy to Scale, Adapt and Change - The architecture of NIRD2020 shall be scalable, manageable and modular by design in order to allow easy adaptation to the evolving needs, and the evolving scientific computing paradigms and technologies.
- Interoperability - NIRD2020 will be placed in a complex landscape, where data-centric design must be correlated with the increasing need for connecting remote heterogeneous data and services across institutions and multiple locations (in Norway or abroad).
The next generation NIRD will be placed in an evolving environment, old HPC machines will be decommissioned and new bigger ones will be placed in locations different from the current locations. Although it would be ideal to have a single location for the NIRD2020 storage infrastructure, the dispersed location of the HPC systems will also require some degree of sparsity for some of the NIRD2020 components to facilitate interoperability with the HPC high-performance storage systems.
Investigating Possible Solutions
The NIRD2020 project is currently exploring the market to understand the technology available to support:
- Storage for different purposes – active, inactive, integrity and disaster recovery
- Cost-effective storage solutions
- Support for data management
- Interoperability framework towards HPC-storage and other storage solutions
- Unified namespace across different domains (possibly distributed over several geographical locations).
We are working hard to procure the next-generation national storage infrastructure that fulfils existing, emerging and future user needs, while ensuring the scalability, reliability, flexibility and cost-effectiveness of the solution.
Our ambition is to complete the market exploration during January 2021 and proceed to the tendering process in early spring.