Machine Learning Consultant for AI in Operations
ApplyDo you get energy from building AI systems that make a tangible difference in operational environments? Do you want to work at the intersection of machine learning and complex infrastructure, on problems where the right approach isn't obvious from the start? In this role, you will develop AI-driven tools for operational insight, help shape AI in Operations as an emerging capability, and contribute to digital twins of large-scale infrastructure. Does it appeal to you? Then we would love to speak to you.
Where you will work
SURF is the IT cooperative for Dutch educational and research institutions. Together, we tackle complex digital challenges to strengthen education and research across the Netherlands. We operate some of the country’s most advanced infrastructure: from the Snellius supercomputer to national research networks and cloud services used across universities, medical centres and research institutes. As a Machine Learning Consultant for AI in Operations, you’ll help make this infrastructure more efficient, reliable, and sustainable, and in doing so, contribute directly to keeping Dutch education and research future-proof.
The team you will join
You will become part of the High Performance Machine Learning (HPML) team within the Advanced Research Solutions department. The team works on training and deploying large-scale AI models, advises researchers on the optimal use of the Snellius supercomputer, and builds AI platforms for the research community. Within the team, you’ll focus on applying machine learning to operational challenges across our services and infrastructure. The team is open, ambitious, and collaborative. We push technical boundaries, share knowledge, and support each other in delivering high-quality work.
What you will do
You will develop use cases, build operational tooling, and help establish AI in Operations as a lasting capability within SURF. You develop models and operational logic with a production mindset, structuring experiments and prototypes so they can be integrated into operational environments, and aligning with relevant technologies. Part of this work happens inside a large international consortium on sustainable compute operations, where you contribute to shared use cases and collaborate with partners at national and international level.
Other tasks you will handle
- You design and implement machine learning approaches for operational insights, including energy optimisation, resource allocation, predictive maintenance, and anomaly detection.
- You contribute to shaping the SURF roadmap for more efficient and intelligent use of infrastructure.
- You support the development of simulation models and digital twins.
- You translate operational needs into actionable AI solutions in close collaboration with operators, researchers, and external partners.
Your skills and experience
- You have an MSc or PhD in a relevant field (e.g. artificial intelligence, mathematics, computer science), with at least 5 years of relevant experience.
- You are comfortable working on problems where the solution is not yet known. You can scope an ambiguous question, design experiments, and reduce uncertainty.
- You have a strong foundation in machine learning and can choose and adapt methods to fit the problem rather than the other way around. You have experience with sequence or time-series data.
- You have a solid understanding of multi-objective optimisation and demonstrable experience designing decision policies under operational constraints.
- You have solid programming experience in Python and can write clean, production-ready code.
- You have experience in building end-to-end ML solutions and bringing them towards production and working with real-world, noisy data.
- You communicate technical ideas effectively to both technical and non-technical stakeholders. Good command of spoken and written English.
It is an advantage if:
- You have experience with the software engineering practices that support reliable ML systems: testing, SQL, Docker, APIs, CI/CD, and Git.
- You have experience with Kubernetes.
- You have experience with Slurm or cluster computers.
Prior to starting this job, a VOG must be presented
SURF takes pleasure in doing its recruitment itself; acquisition is therefore not appreciated.