About the Company
This company develops generative video models that allow users to create animated pictures with ease, incorporating their own existing audio or utilizing text-to-speech models. Having raised over $10M and generating significant excitement with their first two foundational model releases, they are expanding their team in San Francisco.
About the Role
We're looking for passionate Machine Learning Engineers to join our team and help build cutting-edge systems for large-scale data collection, GPU training, and AI model inference optimization. If you have deep expertise in model quantization, parallel inference, and accelerating diffusion models, and you're excited about deploying state-of-the-art ML models in the cloud, this is the perfect opportunity for you.
What You'll Do:
- Build and scale distributed data collection and curation systems to support large-scale model training and inference.
- Optimize GPU-based training pipelines for efficiency and speed, focusing on large-scale model deployment.
- Accelerate inference for diffusion models and transformers, leveraging techniques like model quantization and parallel inference.
- Optimize and implement CUDA kernels, Triton, and TensorRT to maximize inference performance.
- Develop and maintain cloud-based infrastructure (AWS, Oracle) using Kubernetes and Terraform for scalable model deployment.
- Architect REST APIs for distributed systems, ensuring high performance and low-latency responses.
What You Bring:
- 5+ years of experience in Python or Golang, with a strong emphasis on performance optimization.
- Expertise in model quantization, parallel inference, and deploying ML models in production.
- Hands-on experience with PyTorch, TensorRT, Triton, and CUDA kernels for accelerating model inference, especially in large-scale applications.
- Strong background with Kubernetes, Docker, and NVIDIA hardware (GPUs, Tensor Cores).
- Experience scaling pipelines in AWS (SQS, Kafka), implementing infrastructure as code using tools like Terraform.
- A startup mindset—ability to move fast, iterate quickly, and build impactful systems in a fast-evolving space.
- Passion for deploying AI technologies at scale and driving innovation in generative models.
Send your resume today and join us in building the next generation of AI-driven video models!