Python Software Engineer for AI for Science
ApplyWould you like to contribute your code and knowledge to groundbreaking work that helps scientists at Dutch higher education and research institutions make effective use of AI? We are building services and APIs that let research platforms submit, run and track machine-learning experiments on our national computing facilities. This work combines hands-on Python engineering with an eye for scientific workflows, reproducibility and responsible AI. Our tooling is developed as openly and open source as possible, and is used by researchers in physics, life sciences, climate, materials, humanities and beyond. We also provide technical consultancy to researchers so that they can effectively and efficiently use AI technologies and scale up their applications on HPC systems. Are you interested? Then read on!
Where you will work
SURF is the ICT cooperative for Dutch educational and research institutions. Together with our members, we work on digital services and complex innovation challenges to enhance the quality of education and research. We do this together with the institutions, with an eye for public values, as openly as possible and, wherever feasible, as open source.
There is a lot of technical pioneering to be done within SURF. You will have the freedom to make architectural choices, explore new technologies and develop proofs of concept into production-ready solutions. You will also work on issues relating to cloud architecture, scalability, security, data integrity and AI ethics. In this way, you will contribute directly to an infrastructure that makes Dutch education future proof.
The team you will join
You will become part of the High Performance Machine Learning (HPML) team within the Advanced Solutions for Research department. Your colleagues work on training language models, such as OpenEuroLLM and GPT-NL. They also advise researchers on the optimal use of the Snellius supercomputer for AI. You will work with Python on a widely-used AI platform. The team has an open, passionate atmosphere and we help each other.
Working at SURF means working for a unique and open organisation. This is evident in everything: the structure of the organisation, the set-up of the project teams, the culture in our offices and the atmosphere among colleagues. SURF offers excellent terms of employment and a flexible approach to work/life balance. Employees enjoy working independently. In addition, everyone is given the space and freedom to use and develop their talents as effectively and broadly as possible.
What you will do
You are a strong, hands-on Python developer who enjoys the challenge of working at the interface between research and production. On one hand, you are comfortable sitting quietly behind your keyboard and going deep on technical problems. You understand how to turn research prototypes into stable, usable services that scientists can rely on. On the other hand, you can communicate effectively with researchers and provide technical consultancy related to AI. You will be given a lot of freedom and autonomy, and we expect you to handle this responsibly. You are responsible for clarifying requirements when necessary, and your team members will help you with this.
Other tasks you will handle
- You design and build REST APIs that let external research platforms submit and manage ML experiments on our national computing infrastructure.
- You implement workflow orchestration that drives each job through its lifecycle — validation, data staging, submission to the cluster scheduler, monitoring, result collection and upload.
- You integrate with HPC job schedulers and with container runtimes so that ML workloads run reproducibly on GPU and CPU compute nodes.
- You maximise effective use of GPU resources through smart scheduling, efficient data pipelines and sensible caching and message queues.
- You ensure compliance, build management tooling and contribute to a stable, secure platform architecture, including monitoring, incident response and CI/CD improvements.
- You collaborate with scientists across disciplines, helping them select algorithms and tools, train and evaluate models, and make their workflows open and reproducible.
Your skills and experience
- You have more than 3 years of experience as a medior Python developer.
- You have extensive experience with CI pipelines (we use Git and GitLab).
- You are familiar with HPC job schedulers (such as Slurm) or other cluster/batch systems.
- You are familiar with modern asynchronous Python web frameworks (such as FastAPI, Starlette or similar) and with async testing.
- You are familiar with workflow orchestrators for data and ML pipelines (such as Prefect, Airflow, Dagster or similar).
- You have extensive experience with unit, integration and end-to-end testing.
- You are proficient with the Linux command line.
- You are fluent in English (the working language of the team).
It is an advantage if:
- You have a background in the application of AI in scientific research.
- You have experience with ORMs and schema migration tools (such as SQLModel/SQLAlchemy with Alembic, or similar).
- You have knowledge of containers (such as Docker, Apptainer/Singularity or similar), especially in HPC or real-time workflows.
- You have experience with Kubernetes or comparable container orchestration.
- You have experience integrating AI into an application and running it in production.
- You have experience with JAX and/or Julia.
Prior to starting this job, a VOG must be presented
SURF takes pleasure in doing its recruitment itself; acquisition is therefore not appreciated.