
Senior Software Engineer (MLOps) – Serving
- Paris Sophia-Antipolis, Alpes-Maritimes
- CDI
- Temps-plein
- Architect and build systems for serving ML and LLM models across all data centers with strong SLAs, observability, and reliability.
- Design and optimize Ray-based inference infrastructure to handle both low- and high-throughput workloads.
- Enable applied scientists to deploy and test models via self-service tools, CI/CD pipelines, and rollback mechanisms.
- Implement A/B testing and shadow deployment capabilities to evaluate new model versions in production.
- Collaborate with platform teams to improve GPU provisioning, traffic routing, and runtime performance.
- Instrument inference workflows with rich telemetry (latency, token counts, errors) to drive performance and safety analysis.
- You have 6+ years of backend or infrastructure engineering experience, including 2+ years working on ML/AI platforms.
- You have experience building distributed systems, ideally in model serving, realtime inference, or large-scale APIs.
- You are proficient in Python, Go, or another systems language and understand performance tuning in high-throughput environments.
- You are familiar with Ray or other inference-serving frameworks (e.g., TorchServe, Triton, BentoML).
- You've worked with GPUs and understand how to build infrastructure that supports heterogeneous compute workloads.
- Bonus points: experience with AI observability, rollback strategies, or in-house deployment proxies.
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
- Continuous professional development, product training, and career pathing
- Intradepartmental mentor and buddy program for in-house networking
- An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
- Access to Inclusion Talks, our internal panel discussions
- Free, global mental health benefits for employees and dependents age 6+
- Competitive global benefits