Fractal
Job Location :
Fremont,CA, USA
Posted on :
2025-03-18T15:23:39Z
Job Description :
Responsibilities:
- Monitoring system uptime and availability, ensuring functional and performance SLAs.
- Responding to alerts from all critical infrastructure resolving environment issues.
- Participate in analyzing incident trends and identifying root causes of the issues.
- Triage problems for critical services and build automation to prevent problem recurrence.
- Influence and create new designs, architectures, standards, and methods for supporting the platform.
- Understand C3 deployment automation flows to upgrade as needed and effectively troubleshoot issues with system updates and upgrades.
- Must be willing to participate in on-call rotation.
- Work cross-functionally with Services and Engineering teams.
Qualifications:
- Demonstrated a good understanding in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.
- Expertise in Linux Operating Systems, Networking, and Database concepts.
- Experience deploying, upgrading, and troubleshooting Kubernetes clusters and workloads.
- Experience with Cassandra (or another NoSQL alternative).
- Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.
- Experience with configuration management systems such as Puppet.
- Experience in Bash or Python; to automate and monitor systems.
- Experience with IaaC tools like Ansible or Terraform.
- Excellent problem-solving, critical thinking, and communication skills.
- Experience supporting as a DevOps or sys admin for commercial SaaS solutions.
- BS or MS in Computer Science, related field, or equivalent professional experience.
Apply Now!