- Type: Full-Time Join a collaborative research effort with a team of world-class scientists to shape the next generation of high-performance computing tools for large-scale optimization.
- We're looking for exceptional researchers to work on foundational operators akin to BLAS, SparseBLAS, GraphBLAS, LAPACK, and similar libraries.
- The goal is to develop core computational building blocks that power advanced optimization workloads.
Key Responsibilities
As a Research Scientist, you will:Identify essential and emerging basic operations relevant to high-performance optimization platforms.
Conduct in-depth "speed-of-light" analyses to uncover:
- Performance bottlenecks,Scalability characteristics (e.g., iso-efficiency),Trade-offs (e.g., memory vs. communication),Optimal device configurations (e.g., mix of CPUs and accelerators).Design and prototype high-performance, scalable, and productive software systems that serve as the computational foundation for modern optimization solvers.You'll be expected to:
- Design and implement novel operator-level routines tailored for optimization tasks.Analyze algorithms for their theoretical and practical performance under parallel computation models, considering computation, memory access, and data reuse.Apply cache-aware or cache-oblivious techniques and HPC best practices (shared/distributed-memory parallelization, vectorization).
- Research and design data structures for acceleration on CPUs and other processing units (e.g., AI accelerators, GPUs).Enable solvers to be written in a modular, data-centric style with clear and efficient control flows.
- Contribute to or extend existing run-time systems and communication frameworks to improve performance, scalability, and automation of trade-offs.
- Guarantee robust performance and correctness of solvers built atop these operators, with applications in science and industry.
Required Qualifications
- Candidates should bring solid experience in several of the following areas:Optimization of irregular algorithms, such as graph computations or sparse linear algebra, from high-level algorithm design to low-level optimizations like SIMD or locking strategies.
- Multi-core or many-core programming using technologies like POSIX Threads, OpenMP, or similar.
- Distributed-memory computing (e.g., MPI, BSP), including experience with collective communications or RDMA.Experience with performance-tuned code generation frameworks (e.g., ALP, BLIS, DaCE, Spiral, FLAME, Firedrake).
- Strong C++ (C++11 or later) skills, particularly in generic programming, algorithms, and data structures.Proficiency with debugging and performance analysis tools (e.g., Valgrind, GDB, CI systems).
- A strong publication record in top-tier HPC or applied mathematics venues.Excellent communication skills with the ability to clearly convey complex technical content.
- A collaborative mindset and comfort working in diverse, international teams.
- Preferred Qualifications Experience with the following is considered a plus:
- GraphBLAS or Algebraic Programming paradigms.Optimization methods or solver design.High-performance interconnects and programming (e.g., Infiniband, RDMA).Accelerator programming (e.g., CUDA, OpenCL).
- Publications in physical sciences or theoretical computer science.