HPC Administrator
ApplyDo you want to contribute to the daily management and improvement of the Dutch National Supercomputer Snellius? Are you a system administrator that wants to provide top tier High Performance Computing (HPC) services, develop (secure) computing services, and manage a reliable and secure facility. Then we have the perfect challenge for you!
Where you will work
SURF is the ICT cooperative for Dutch educational and research institutions. Together with them, we work on digital services and complex innovation challenges to enhance the quality of education and research. As an HPC administrator, you will contribute significantly to this by ensuring that researchers have access to top tier HPC facilities.
The team you will join
You will be joining the HPC Services team. This team is responsible for hosting, maintaining, and supporting users on the National Supercomputer Snellius. You will collaborate with the other system administrators and advisors in the HPC Services team but also work closely with the other HPC team at SURF – the HPC Development team.
Together the HPC Services and HPC Development are responsible for various computing services and the associated user and application support, including supporting the National Supercomputer Snellius, the LUMI pre-exascale Supercomputer, HPCV Services and SURF’s Experimental Technologies Platform. The teams focus, among other things, on providing users of the SURF (and international) infrastructures with permanent access to the most appropriate hardware and software solutions. We assist users in efficiently using new technologies and methods, enabling them to make effective use of the available infrastructure. In this way, we contribute to our mission of facilitating top-level Dutch research.
What you will do
Picture it for a moment: in this position, you will work with Snellius, the Dutch National Supercomputer. You'll be right on top of Snellius' advanced computing infrastructure, with an impressive 240,000 cores, 640 GPUs and a staggering peak performance of around 30 Pflops/s. As our new HPC administrator, it will be up to you to manage and continuously improve Snellius and the OSSC (ODISSEI secure supercomputer) environment to the best of your ability.
Other tasks you will handle
- Maintain and improve batch system based on SLURM
- Maintain and improve configuration management of the system based on Confluent and SALTSTACK
- Maintain and improve parallel filesystems, currently based on GPFS
- Maintain and improve monitoring, logging and reporting systems
- 3rd line helpdesk support Snellius
Your skills and experience
You are curious about new techniques and advanced HPC environments. You are analytically strong, look ahead, and have a proactive attitude. You have good communication skills and your experience with various (cluster) environments, scripting languages and technologies complete the picture.
In addition, you have:
- A bachelor's or university degree or equivalant working experience in IT technology;
- Demonstrable experience in managing Linux (cluster) environments;
- Proficiency in scripting languages (Bash and Python), proficiency in programming languages C, C++ and Rust is a pre ;
- Knowledge of virtualisation, containers;
- Knowledge of security aspects in respect to Operating systems and software/network layers;
- Knowledge of networking. Experience with Ethernet and Infiniband protocols;
- Knowledge of technologies such as batch scheduling (Slurm, Loadleveler, PBS, etc), (Software) Storage solutions (IBM Spectrum Scale (GPFS), CXFS, BeeGFS, etc) monitoring software, (Grafana, Prometheus), database software (MariaDB, Postgres), configuration management software (SaltStack, Ansible, CFEngine, Puppet) and version control software.
Prior to starting this job, a VOG must be presented.
SURF takes pleasure in doing its recruitment itself; acquisition is therefore not appreciated.