Advanced user support migrate workflows to HPC

03.03.2022

From time-to-time users of the national HPC systems encounter issues that need more attention than regular user support. This can be an issue related to a specific project or something affecting several projects within a certain discipline.

Such issues must be handled on time, both to provide the users with extra assistance and to ensure efficient usage of the Sigma2 resources. NRIS handles cases like this through dedicated Advanced User Support (AUS) projects.

A flowering ledebouria plant.

As part of a discipline-specific AUS, a research group from the National History Museum (NHM) at the University of Oslo who works on supercomputer Saga were unable to run their pipeline as expected.

They work in a field of biology called systematics and construct family trees of plants. The illustration above shows a flowering Ledebouria plant, one of the species in question.

DNA sequencing large amounts of data

The researchers need to use HPC to conduct their research because they have so many DNA sequences per species that it would take “forever” to handle them on their personal computers. The programs they use for making phylogenies are also computationally intensive, and the load becomes heavier and heavier with increased amounts of input data. Attempting to run these analyses on their computers would be unavailable for other work.

"We strive to communicate the guide to other users who potentially might face similar issues in the future. We try to provide solutions to issues related to genome assembly and analysis, which is a common usage scenario in the bioinformatics domain and thus experienced by several projects."

Sabry Razick, NRIS

Previously, they had run the software on their personal computers with a straightforward setup. However, when they tried to install it on Saga, they kept running into an error. The researchers needed help from NRIS and an AUS to understand the issue fully and to resolve it. Luckily, the NRIS experts Sabry Razick from UiO and colleague Oskar Vidarsson at UiB were ready to help.

Personalised support

—This AUS helped us throughout the process. Sabry decided to tackle an issue at a time which allowed us the opportunity to understand the process of fixing the errors and give us full personalised support on the issues that arose. This fixed the pipeline in the end and allowed us to push forward with our job, says J. Adrian Chimal Ballesteros, PhD candidate at NHM.

The challenges of the NHM research group were thus resolved as they were able to explain their issues and state the desired solution clearly through regularly scheduled follow-up meetings together with the assigned experts. Of course, as these problems are complex, some time was spent to resolve the problem.

—This might also have been due to a technical aspect that I am not aware of. The time it took to solve the issue is understandable since a few projects were being aided in tandem, says J. Adrian.

Bilde
Solveig Bua Løken out in the field.
Researcher Solveig Bua Løken at NHM finds "her" plant, on fieldwork in Zimbabwe.

About the research group

We are working in a field of biology called systematics, where we are basically constructing family trees (= phylogenies) of species. The species we are interested in are plants, more specifically two understudied genera in the asparagus family.

We are using next-generation sequencing data (very many DNA sequences) to construct phylogenies for these genera. We do this to understand how many species actually exist in nature, and how they are related to each other.

To adequately describe species diversity and their evolutionary relationships are important building blocks for all other areas of biological research.

We also need to know which species exist, and how many we have, to be able to preserve the world’s biodiversity.

Making the solution available for other researchers

As a deliverable, commonly used reference datasets were installed as a shared resource on Saga, formulated optimal setups for wtdbg2 software for amphibian sequence assembly pipelines, and a highly extendable job setup for HybPiper pipeline for plant genome assembly was provided. Several groups that use Saga for bioinformatics use the same reference databases made available by NRIS' Oskar Vidarsson and Sabry Razick. Through this AUS project, these are collected and made available in a common place on Saga for the benefit of all projects with similar needs.