Site Reliability Engineer | Manhattan, NY, USA | In-Office - Selby Jennings : Job Details

Site Reliability Engineer | Manhattan, NY, USA | In-Office

Selby Jennings

Job Location : New York,NY, USA

Posted on : 2025-01-30T11:24:25Z

Job Description :
Site Reliability EngineerSelby Jennings Manhattan, United States

Apply now

Posted 1 day ago In-Office Job Permanent USD200000 - USD400000 per year

We are seeking a Site Reliability Engineer to join our Infrastructure team. You'll manage a diverse technology stack, including Kubernetes, virtualization, and CI/CD. Proficiency in automation frameworks and Infrastructure as Code is essential. Responsibilities will include designing secure platforms, supporting engineers, and enhancing infrastructure reliability.

Responsibilities:

  • Design and maintain a robust and secure Kubernetes and GitOps CI/CD platform capable of handling large data volumes and diverse technology loads.
  • Assist engineers using the platform by providing clear communication, advice, troubleshooting, maintaining documentation, and implementing feedback-driven improvements.
  • Promote and implement Infrastructure as Code principles and best practices.
  • Lead projects from design through to implementation, testing, monitoring, documentation, and support.
  • Automate processes to reduce manual work in large, distributed systems.
  • Collaborate individually and with teams to enhance the reliability, availability, and performance of the infrastructure.

Skills:

  • Proficient in writing and maintaining applications and APIs in languages such as Python, Go, or Shell.
  • Extensive experience with cloud-native and containerization technologies like Kubernetes and Docker.
  • Strong knowledge of Linux systems.
  • Experience with configuration management tools such as Terraform, Puppet, or Ansible.
  • Understanding of network technologies, server virtualization, and storage.
  • Familiarity with observability systems like Prometheus, Grafana, ELK, or Jaeger.
  • Experience with distributed data platforms such as Kafka, Flink, or Airflow.
  • Self-starter with the ability to quickly grasp concepts, implement new ideas, and think creatively.
  • Focused on enhancing system availability, security, and resilience through testing, monitoring, standardization, and automation.
  • Ability to clearly explain the rationale behind best practices.
  • Capable of building positive and collaborative relationships with colleagues across teams and locations.
#J-18808-Ljbffr
Apply Now!

Similar Jobs ( 0)