documentation

NIRD2020 is seeking for further information

The project is in january/february collecting further information on the available technologies in the market.

The major points of interest and questions to be explored are listed below. The drawing is not meant to be the desired architecture, only representing the components which are mentioned in the questions, to facilitate understanding and support the discussion.

nird2020 components

 

  1. Users might have data which are seldom accessed (inactive data) and data which are often accessed (active data) but may do not know their access pattern a-priori. In the case of a tiered solution the inactive data shall go conveniently to less expensive storage. How can data be moved between tiers?
  2. In case the storage facility is in the vicinity of (located meters away from) HPC, how can the following workflow be facilitated: data on storage -> data on HPC -> scheduling/running job -> collecting results -> storing back results on the storage? And in case the HPC is located kilometers away?
    1. How can the new storage solution integrate with existing and future HPC storages?
    2. What protocols exists to support the integration?
    3. What solutions exist for annotating or tagging data in the archive? Is it for users or for admins?
  3. Is it possible to integrate the new storage with existent (old, not necessarily high-performance storage) storage, for example, in order to store second replica of important data?
  4. How can tens of PBs of data be safely migrated without the data being off-line? How does the migration process depend on the distance between the old and the new storage?
  5. What solutions exist for long term archiving of important data?
    1. How can the user interact with the archived data?
    2. Which access protocol can/shall be supported?
    3. What solutions exist for annotating or tagging data in the archive? Is it for users or for admins?
  6. What solutions exist for disaster recovery?
  7. What solutions exist that allow the users to granularly select the replication/backup by themselves at file/object level?
  8. What measures exist to ensure that a backup/replica is unaltered in the case of data corruption or a ransomware attack? 
  9. How scalable are the currently available storage technologies for active/inactive/archiving/backup/replication - with respect to both capacity and capability? Is it possible to expand a  component expand without significant addition/modification of the architecture?
  10. What are the available solutions to ensure hardware life-cycle management and service continuity when a hardware component has reached the EOL?
  11. What are the available solutions to ensure data availability in the case of failure of one or more components? And in the case of maintenance of one or more components? How about the case of system expansion?
  12. What are the solutions that reduce the probability of data loss or reduce the probability of data being unavailable for more than 24 hours?
  13. Are there solutions to allow data analytics on data stored on the storage facility without unnecessary data movement and at the same time ensuring suitable security/isolation and scalability?

 

If you have any question, please contact maria.iozzi@uninett.no.