DevOps/SRE Engineer
ApplyWe are looking for a DevOps Engineer with an affinity for SRE (Site/Service Reliability Engineering). Are you enthusiastic about the latest technologies in the field of Infrastructure as Code, Containers and Cloud Technologies? Would you like to work in a unique, interesting and dynamic working environment that focuses on the connection between science, ICT and society? Would you like to design and develop the most comprehensive and user-friendly cloud services for research and education? Do you enjoy working with stakeholders? Then read on!
Where you will work
SURF is the ICT cooperative for Dutch educational and research institutions. Together with them, we work on digital services and complex innovation challenges to enhance the quality of education and research.
The HPC Cloud team works closely with other SURF teams, public cloud providers and the European Open Science Cloud (EOSC) to offer the SURF Research Cloud (SRC) service and the national EOSC node. SURF Research Cloud creates efficient virtual research environments at national and international level. This enables users to increase their scientific impact. We take the time to get to know and understand the scientists, translate their projects into technical and user requirements, and work together to find a solution.
The team you will join
The HPC Cloud team is an energetic and multidisciplinary DevOps team. We have seven nationalities and are very internationally oriented. We develop, manage and support SURF Research Cloud and collaborate with the European Open Science Cloud. To this end, we maintain contracts with SURF members, national and international research infrastructures and research communities. In addition to our daily work, we also organise fun activities together.
Working at SURF means working for a unique and open organisation. This is evident in everything: the structure of the organisation, the set-up of the project teams, the culture in our offices and the atmosphere among colleagues. SURF offers excellent terms of employment and a flexible approach to work/life balance. Employees enjoy working independently. In addition, everyone is given the space and freedom to use and develop their talents as effectively and broadly as possible.
What you will do
You will contribute to the development of the DevOps and Agile culture within SURF. In addition, you will proactively participate in the design and implementation of services running on platforms such as Kubernetes, Openstack, AWS, Azure, Google GCI and Oracle OCI, both for our internal users (e.g. development teams, consultants) and for end users in science, research and education.
What else you are involved in
- You will actively participate in the ongoing process of (re)designing our range of cloud services. All software for the services you work with has been developed by SURF and is open source.
- You will work extensively with Kubernetes, creating implementations and solving problems, and you will feed your findings back into the codebase to improve the overall quality of our CI/CD infrastructure as code. This includes associated monitoring, logging, and alerting.
- You will improve the security and reliability of our infrastructure, with a focus on IAM, secrets management and compliance. You will also contribute to the documentation of processes, systems and best practices to ensure knowledge sharing and maintain transparency within the team.
- You will work with suppliers and users to link state-of-the-art technologies to potential future research and educational needs.
- You will participate in developers sprint meetings.
The technology you will be working with
- You will develop and maintain Terraform modules, Ansible Playbooks and Helm Charts, with a dash of Bash and Python.
- You will design, implement and optimise CI/CD pipelines (GitLab) to deploy services on Kubernetes and public clouds.
- You will work with tools such as Prometheus, Grafana and ELK Stack.
- The SRC technology stack consists of: Terraform, Vault, Packer, Ansible, Docker Compose and Kubernetes. The clouds we support are on-premise OpenStack, AWS, Azure, Oracle and Google Cloud Platform. We are working on other hypervisors, job-based systems and clouds. SRC supports various Linux variants and MS Windows on virtual machines.
Your skills and experience
- You have a bachelor's or master's degree in a technical discipline.
- You are familiar with DevOps and Git-oriented approaches and are willing to help and guide others in applying these methodologies.
- You feel comfortable taking responsibility for the availability, performance and monitoring of services.
- You are curious about emerging technologies; words such as Docker and Kubernetes are part of your “normal” toolbox, but you are also very familiar with the “basics” (also known as Linux skills).
- You have experience with a programming or scripting language, such as Python, and are familiar with configuration management tools and Infrastructure as Code (Ansible, Terraform, etc.).
- You understand and appreciate the complexity of multi-layered IT environments and have a functional understanding of concepts and definitions such as: frontend, backend, databases, network topologies, identity management, security groups, firewalls, service meshes, service discovery, zero-trust, etc.
- You are service-oriented, solution-oriented and feel comfortable in a team where we share what we don't know, we ask for help and guidance, we don't hide mistakes and we work hard to make each other better and more productive.
- Previous experience with a cloud provider is strongly preferred, but where necessary we can supplement this later.
Prior to starting this job, a VOG must be presented.
SURF takes pleasure in doing its recruitment itself; acquisition is therefore not appreciated.