Research data varies in nature and necessitates diverse technologies and services contingent on various factors, such as sharing and usage requirements.
Within this section, you'll discover a data categorisation aiding your selection of the most fitting Sigma2 services for your specific dataset.
It's important to note that these categories pertain solely to Sigma2 services and lack universal applicability beyond our service framework.
Categories according to sharing requirements
The types of data services and storage technologies needed vary depending on the sharing requirements. Below, you'll find a table that describes different data types based on sharing requirements. While this categorization isn't exhaustive, it offers guidance to users seeking the most suitable Sigma2 data services. Should your data not align with any of these categories, feel free to reach out to us for assistance.
Many institutions have embraced a modified version of colour-coded data categorisation. To enhance clarity, we've included a comparison between this categorization and Sigma2's approach.
You find the the categorisation according to the usage requirements further down on this page.
|Sigma2 storage service
|Category by colour
|Sensitive personal data
|Data containing sensitive information about individuals, such as health condition, political orientation, sexual orientation etc. Handling of these data is regulated by laws such as GDPR, the Personal Data Act (Personopplysningsloven) or the Person Health Data Filing System Act (Helseregisterloven).
|Black and Red
|Data - including personal data that are not sensitive, whose unauthorized disclosure, alteration, or destruction could cause significant damage to individuals, private businesses or public bodies. This data requires security and controlled access in accordance with the type of confidentiality and the data owner’s risk assessment.
|Restricted/confidential data can be stored in NIRD and computed on the HPC system or NIRD Service Platform provided that the data owner / data controller has made the necessary risk assessment. To facilitate the assessment the risk analysis of Sigma2’s services might be provided. Solutions for restricted access data – data encryption, access control strategies - can be specifically offered on demand.
|Data whose unauthorized disclosure, alteration, or destruction could cause low or moderate damage to the specific research project.
|All Sigma2’s services are suitable for internal data. Access to the resources on Sigma2’s services will be administered by the Principal Investigator.
|Data that doesn’t need protection against unauthorized access, but that does need protection against unauthorized modification or destruction.
|All of Sigma2’s services are suitable for Public data. Data can be made publicly available either by exposing the data through a web service or publishing the data on the NIRD Research Data Archive.
Categories according to usage
Different types of data services and storage technologies are required depending on the usage pattern. The following table provides a description of various data types according to usage patterns. This categorization is not meant to be exhaustive but provides guidance to the users in selecting Sigma2’s data services that best fit the purpose. If your data does not fit in any of the described types, you can contact us for guidance.
|Sigma2 storage services
|High I/O Data
|Data actively used for high I/O, low latency operations during computation on high-performance facilities. Data read/written during computations are normally not stored permanently in its pristine form, but rather pre/post-processed (normally on other storage media).
|HPC Scratch Storage; (NIRD Data Storage)
|Data in active use and accessed frequently during an active research process is dubbed "active”. Active data is typically accessed, and processed, and can serve as input to new calculations on an HPC system or analytics cloud service for instance. It is therefore necessary to maintain this data on a storage technology with high performance (high-performance storage and high-speed network) while at the same time coping with multiple users and processes. We assume data to be active if accessed (read/write) more frequently than once every six months.
|NIRD Data Storage; (HPC Project storage)
|Data that is still relevant for an active research process but accessed less frequently than once every six months. Cold data should be accessible via various protocols (i.e., POSIX, S3, HTTP/REST), but can be stored on storage media with low to moderate performance.
|NIRD Data Storage - “NIRD Data Lake”
|Curated Data (published)
|Data that is archived and published by issuing a DOI (Digital Object Identifier). Such data is typically archived for several years, with some curation requirements. It is expected that the data is no longer useful after a decade and can, in principle, be deleted after such time. Data are attached to a license and can be discovered and accessed according to its license.
|NIRD Research Data Archive
|Data that serves as a backup copy only of a primary dataset. This is a subclass of cold data, is immutable and read-only and it is only accessed in the event of the primary data becoming corrupted. Data of this type must be stored with a reference checksum value.
|Incremental Backup of the data stored on NIRD Data Storage (on-demand service)