Site Reliability Engineer - Fractal : Job Details

Site Reliability Engineer

Fractal

Job Location : Fremont,CA, USA

Posted on : 2025-03-18T15:23:39Z

Job Description :

Responsibilities:

  • Monitoring system uptime and availability, ensuring functional and performance SLAs.
  • Responding to alerts from all critical infrastructure resolving environment issues.
  • Participate in analyzing incident trends and identifying root causes of the issues.
  • Triage problems for critical services and build automation to prevent problem recurrence.
  • Influence and create new designs, architectures, standards, and methods for supporting the platform.
  • Understand C3 deployment automation flows to upgrade as needed and effectively troubleshoot issues with system updates and upgrades.
  • Must be willing to participate in on-call rotation.
  • Work cross-functionally with Services and Engineering teams.

Qualifications:

  • Demonstrated a good understanding in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.
  • Expertise in Linux Operating Systems, Networking, and Database concepts.
  • Experience deploying, upgrading, and troubleshooting Kubernetes clusters and workloads.
  • Experience with Cassandra (or another NoSQL alternative).
  • Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.
  • Experience with configuration management systems such as Puppet.
  • Experience in Bash or Python; to automate and monitor systems.
  • Experience with IaaC tools like Ansible or Terraform.
  • Excellent problem-solving, critical thinking, and communication skills.
  • Experience supporting as a DevOps or sys admin for commercial SaaS solutions.
  • BS or MS in Computer Science, related field, or equivalent professional experience.

Apply Now!

Similar Jobs ( 0)