Current challenges on Fram

As for earlier allocation periods, the current 2017.2 period ending 31 March is no exception with regards to high load during the last month of the allocation period.

All HPC-systems experience high loads, and long run queues. Refer to the live status on https://www.sigma2.no/hardware/status for current information.

Additional challenges on Fram

Unfortunately, Fram has additional challenges. The most severe is the current I/O problem, which was a problem already reported as addressed and closed in an earlier file system software release. Recent analysis by our system administrators reveals that this problem unfortunately still appears to be haunting us. The issue is reported, and escalated, with the file system vendor. Not surprisingly, it is a complex problem which requires further analysis by the vendor, and it is thus unknown when this problem might be fully resolved.

Another challenge on Fram is related to the balance between development work and production runs. Currently, several projects have expressed their concerns about difficulties to get necessary priority for development on Fram. Being into its first period of full production, we are still reviewing the queue system setup, and addressing the needs for development is an important issue in this regard. The underlying problem is that we are short on aggregate capacity.

This will be mitigated by providing CPU time on the NTNU Vilje system also after 31 March, but it will not resolve the issue that improving the development service will draw cycles from the production capacity.

Follow us on Twitter

We will continue to post any developments on the opsys log (https://opslog.sigma2.no/), and users are encouraged to follow developments here and on the @MetacenterOps Twitter Channel