SITE RELIABILITY ENGINEER / CHELTENHAM (REMOTE) / UP TO £95K & GREAT BENEFTIS
Fantastic new opportunity for an experienced Site Reliability Engineer to join a dynamic organisation with ambitious growth plans. Excellent pay & extensive benefits package. Remote working supported until DV Clearance is obtained, then onsite in Cheltenham.
In 2019, our founders were working as engineers solving complex cross domain problems in defence and security organisations. TwinStream was formed to consolidate their collective expertise and experience into one business, providing technical excellence and exceptional service to their clients. The business is headquartered in Cheltenham with teams working both on-site with clients and remotely from home.
We are looking for skilled Site Reliabilityengineers to join a new team that will deploy and maintain our established cross-domain system for a customer. The system uses an AMQP event-driven microservices architecture and extensively utilises docker container services.
This role is perfect for an experienced engineer who is comfortable working in a managed service environment and wants to gain more experience with best-of-breed DevOps tools and techniques.
What’s on Offer?
- Highly competitive salary of £75,000 to £95,000 (depending on experience).
- Collaborate with Feature Development teams to promote new component versions into production as efficiently as possible.
- Maintain the system to agreed service level and availability objectives using real-time monitoring tools and system generated metrics.
- Instrumentation of new system metrics and alerts to pre-empt issues and improve performance.
- Respond to monitoring alerts and customer incidents, taking preventative/remedial action to minimise customer impact.
- Liaising with key customer stakeholders to schedule capability changes and capture new service requirements as they arise.
- Apply automation techniques to reduce manual operations burden.
Key Responsibilities of the Site Reliability Engineer:
- Collaborate with Feature Development teams to promote new component versions into production as efficiently as possible.
- Maintain the system to agreed service level and availability objectives using real-time monitoring tools and system generated metrics.
- Instrumentation of new system metrics and alerts to pre-empt issues and improve performance.
- Respond to monitoring alerts and customer incidents, taking preventative/remedial action to minimise customer impact.
- Liaising with key customer stakeholders to schedule capability changes and capture new service requirements as they arise.
- Apply automation techniques to reduce manual operations burden.
Skills & Experience:
- Must be eligible and willing to undergo DV clearance.
- Experience in infrastructure automation tools (CloudFormation, Terraform or Ansible)
- Experience working with docker containers & container orchestration tools (such as Kubernetes, OpenShift or Docker Swarm)
- Experience using and maintaining CI / CD tools (such as Jenkins or GitHub actions)
- Good understanding of relational databases and SQL
- Linux command line, administration and shell scripting
- Solid understanding of monitoring, auto-scaling, performance tuning, troubleshooting and disaster recovery best practices
- Working knowledge of network security protocols
- Working knowledge of AWS
- Experience with monitoring tools such as InfluxDB, Prometheus or Grafana
What’s Next?
If you have the passion and skills to be successful in this Site Reliability Engineer position, we would love to hear from you. APPLY NOW for immediate consideration.
Further Information:
Due to the industries in which some of our client’s work, and to comply with their requirements, any offer would be conditional on achieving satisfactory Baseline Personnel Security Standard (BPSS) screening results, and on subsequently achieving and retaining Developed Vetting (DV) clearance.