Central Data Library Policy

Version 1.2

Scope

The Central Data Library (CDL), is a complimentary service offered by Sigma2 which serves as a centralized repository for research projects. The CDL facilitates the storage of various data types which are intended to be shared across multiple projects, such as: input datasets, libraries, AI-models and AI-training data.

Note that data deposited in the CDL is not permanent or persistent and shall not be considered a replacement for the NIRD Research Data Archive.


Application Acceptance Procedure

Applications may be submitted at any time throughout the year and will be continuously evaluated.

Eligibility Criteria

  • Relevance of the shared library for multiple research projects, i.e., proven use by multiple projects, multiple users, and depending on applicability, multiple HPC systems and/or cloud systems.
  • Data format compliance (e.g., metadata standards, file formats, ownership, contact, etc.).
  • FAIRness
  • A simple data management plan is provided.
  • RO-crate is maintained by the PL/XO for each version of the dataset in the CDL.
  • If applicable, ethical and legal approvals (e.g., ethical evaluation, data management approval, data-sharing agreements).

Application Form

  • Follows the regular application process for storage resources.
  • PL shall be able to flag the application as CDL-application.
  • Additionally
  • Estimated data volume and technical requirements (e.g., storage, software, access frequency).
  • Access level requirements (e.g. restricted to X & Y projects, restricted to community).
  • Access policies (e.g. WORM, read-only, retention time)
  • If applicable, compliance with legal, ethical, and intellectual property guidelines.

Application Review

  1. Verification: the administration screens applications to ensure all required information is provided and the application meets eligibility criteria (if applicable, ethical and legal approvals are in place, DMP, and financing). The administration reviews also compliance in terms of data security and privacy based on the information provided (e.g. handling of sensitive data).
  2. Technical review: technical experts from NRIS, also members of the Resource Allocation Working Group, assess whether the data format, structure, volume, and access pattern are compatible with the infrastructure.
  3. Decision: The administration can either accept, reject, or send the application to rework, based on outcome of verification and technical review.
  4. Data-sharing agreement: Should the need arise, before final acceptance, a data-sharing agreement that outlines responsibilities, intellectual property rights, terms of access, and conditions of use is signed by the parties.

Unless a specific service level agreement (SLA) is mutually agreed upon and signed by both parties, the standard SLA for the service will apply.

Likewise, the operational level agreement (OLA) shall follow the standard DOB (Drift- og brukerstøtte) agreement.

Access and Permissions Management

Access and allocation

  1. Resources on CDL will be allocated on the NIRD Data Lake.
  2. Each project will get a NSxxxxB account, connected to a superproject.
  3. Each CDL project will get an S3 service account named cdl-nsxxxxb
  4. S3 buckets will have the following format cdl-nsxxxxb-bucketname.

Roles

  • Project leader (PL): Responsibility and accountability for the central data library.
  • Executive officer (XO): Responsibility for the central data library.
  • Role-based access control (RBAC): PL and XO are given full access to manage the library and users, in particular, description of RBAC-rules to manage access permissions based on the project, user role, application accounts, and data sensitivity.
  • Researchers: access based on project needs (e.g., append-only, read-only, read-write).
  • External Users: access limited to public data or data authorized by PL/XO.
  • Machine actionable content: Application (machine) S3 accounts shall be provided. Correct utilization of application accounts, API access is the sole responsibility of the PL.

Services and functionalities 

Data integrity

Snapshots: Data integrity is by default ensured through snapshots.

Replication: Replication may be provided on-demand, as an additional security mechanism. Regular, asynchronous replication between primary and secondary storage resources can be configured to minimize chances for data loss and corruption.

Monitoring

Data patterns and cataloging: Optionally, at Sigma2’s discretion, metadata cataloging and indexing may be provided as complementary service.

Access monitoring: Monitoring tools to track data access, downloads, and usage trends to ensure compliance with the terms of use and assess resource demands shall be implemented and followed up as part of service operation by the DO (Driftsorganisation) within NRIS.

Usage: resource usage shall be collected at all times and historical data retained in the project management system (MAS).

Governance

Roadmap

The NIRD Data Library / Central Data Library shall be part of the NIRD Data Lake product roadmap.

Review

Monitoring tools to track data access, downloads, and usage trends to ensure compliance with the terms of use and assess resource demands shall be implemented and followed up as part of service operation by the DO (Driftsorganisation) within NRIS.

Billing

The service will be provided free of charge, provided it is not misused and contributes to improved data flow, data deduplication, and measurable research achievements, such as an increase in publications utilizing the library. 

Data retention 

All allocations shall be granted for a minimum period of one year, starting from the date of application approval. Validity of the allocation will commence at the beginning of the nearest upcoming allocation cycle (April or October), regardless of the submission date.

Decommissioning

It shall follow the project decommissioning policy. Before removing data, PL/XO shall assess the need for long-term preservation in the NIRD Research Data Archive, or migration to another data platform. Alternatively, the project may consider transferring ownership to Sigma2, as outlined in the Transfer of ownership section.

Transfer of ownership

In cases where a CDL project exceeds the data retention time or is terminated by the PL, but the associated dataset is of interest to Sigma2, Sigma2 may retain or take over the data subject to:

A formal, mutual agreement between the service provider and the data owner.

Clear documentation of data ownership and copyright status.

If applicable, a written consent from the data owner outlining the terms of use, access, and retention time.

  1. A formal, mutual agreement between the service provider and the data owner. 
  2. Clear documentation of data ownership and copyright status. 
  3. If applicable, a written consent from the data owner outlining the terms of use, access, and retention time. 

Categorization

Sigma2’s data policy and category shall be kept in line with the Central Data Library policy.