Data policy | Sigma2

Research data varies in nature and necessitates diverse technologies and services contingent on various factors, such as sharing and usage requirements.

Within this section, you'll discover a data categorisation aiding your selection of the most fitting Sigma2 services for your specific dataset.

It's important to note that these categories pertain solely to Sigma2 services and lack universal applicability beyond our service framework.

Categories according to sharing requirements

The types of data services and storage technologies needed vary depending on the sharing requirements. Below, you'll find a table that describes different data types based on sharing requirements. While this categorization isn't exhaustive, it offers guidance to users seeking the most suitable Sigma2 data services. Should your data not align with any of these categories, feel free to reach out to us for assistance.

Many institutions have embraced a modified version of colour-coded data categorisatio n. To enhance clarity, we have included a comparison between this categorization and Sigma2's approach.

You find the the categorisation according to the usage requirements further down on this page.

Type	Description	Sigma2 storage service	Category by colour
Sensitive personal data	Data containing sensitive information about individuals, such as health condition, political orientation, sexual orientation etc. Handling of these data is regulated by laws such as GDPR, the Personal Data Act (Personopplysningsloven) or the Person Health Data Filing System Act (Helseregisterloven).	TSD services	Black and Red
Restricted/confidential data	Data - including personal data that are not sensitive, whose unauthorized disclosure, alteration, or destruction could cause significant damage to individuals, private businesses or public bodies. This data requires security and controlled access in accordance with the type of confidentiality and the data owner’s risk assessment.	Restricted/confidential data can be stored in NIRD and computed on the system or NIRD Service Platform provided that the data owner / data controller has made the necessary risk assessment. To facilitate the assessment the risk analysis of Sigma2’s services might be provided. Solutions for restricted access data – data encryption, access control strategies - can be specifically offered on demand.	Red
Internal data	Data whose unauthorized disclosure, alteration, or destruction could cause low or moderate damage to the specific research project.	All Sigma2’s services are suitable for internal data. Access to the resources on Sigma2’s services will be administered by the Principal Investigator.	Yellow
Public data	Data that doesn’t need protection against unauthorized access, but that does need protection against unauthorized modification or destruction.	All of Sigma2’s services are suitable for Public data. Data can be made publicly available either by exposing the data through a web service or publishing the data on the NIRD Research Data Archive.	Green

Categorisation of data.

Categories according to usage

Different types of data services and storage technologies are required depending on the usage pattern. The following table provides a description of various data types according to usage patterns. This categorization is not meant to be exhaustive but provides guidance to the users in selecting Sigma2’s data services that best fit the purpose. If your data does not fit in any of the described types, you can contact us for guidance.

Type	Description	Sigma2 storage services
High I/O Data	Data actively used for high I/O, low latency operations during computation on high-performance facilities. Data read/written during computations are normally not stored permanently in its pristine form, but rather pre/post-processed (normally on other storage media).	HPC Scratch Storage; (NIRD Data Peak)
Active Data	Data in active use and accessed frequently during an active research process is dubbed "active”. Active data is typically accessed, and processed, and can serve as input to new calculations on an HPC system or analytics cloud service for instance. It is therefore necessary to maintain this data on a storage technology with high performance (high-performance storage and high-speed network) while at the same time coping with multiple users and processes. We assume data to be active if accessed (read/write) more frequently than once every six months.	NIRD Data Peak (HPC Project storage) Cold data
Cold data	Data that is still relevant for an active research process but accessed less frequently than once every six months. Cold data should be accessible via various protocols (i.e., POSIX, S3, HTTP/REST), but can be stored on storage media with low to moderate performance.	NIRD Data Peak NIRD Data Lake
Curated Data (published)	Data that is archived and published by issuing a DOI (Digital Object Identifier). Such data is typically archived for several years, with some curation requirements. It is expected that the data is no longer useful after a decade and can, in principle, be deleted after such time. Data are attached to a license and can be discovered and accessed according to its license.	NIRD Research Data Archive
Backup data	Data that serves as a backup copy only of a primary dataset. This is a subclass of cold data, is immutable and read-only and it is only accessed in the event of the primary data becoming corrupted. Data of this type must be stored with a reference checksum value.	Incremental Backup of the data stored on NIRD Data Peak (on-demand service)

Data types according to usage.