
SRE Engineer
- Boulogne-Billancourt, Hauts-de-Seine
- CDI
- Temps-plein
- Help to implement best practiceobservability tooling and process.
- Set up dashboards and alerts to detect anomalies, failures, and performance degradation.
- Continuously monitor the health and performance of production systems using observability tools.
- Participate in on-call rotations and respond to service incidents in a timely and effective manner.
- Lead root cause analyses, document postmortems, and implement corrective actions to prevent recurrence.
- Help deliver hands-on coding improvements / optimisation.
- Collaborate with development teams to enhance service efficiency and capacity planning as well as to delive. improvements that require roadmap resource allocation.
- Continuously improve internal tools and scripts to reduce manual operational overhead.
- Conduct routine reliability checks, capacity testing, disaster recovery drills, and infrastructure audits.
- Validate system failover and data recovery processes.
- Analyze system performance and implement tuning for speed, scalability, and cost efficiency.
- Generally, this role will be 50% writing code to automate processes & implementing continuous improvements and 50% investigating issues and supporting their resolution. You will need to have a passion for using software development techniques and experience to diagnose & solve operational (application, architectural & infrastructure) problems.
- Due to the nature of this role, you may sometimes be required to provide on-call cover or work non-standard shift patterns to support any critical missions or incidents and work closely with our dedicated support teams and lead blameless retrospectives with the goal of discovering the best approach to prevent future occurrences of the same issue.
- Your mission will be to make the Sidetrade platform as stable and reliable as possible. Your role would be to measure, monitor availability, latency and overall system health, and understanding trends and capabilities of the service and improve them as much as possible.
- To implement AI / Agentic tooling to make your role 3x as efficient as a regular SRE.
- Someone with a real passion for reliability and the drive to start a new function within a growing company.
- A forward thinker who is looking to push the boundaries & use AI to deliver more value in their role.
- Someone who can help lead a small team of SRE’s
- Significant experience as Site Reliability Engineer
- Experience with orchestration tools such as Ansible
- Familiarity with Docker and Kubernetes concepts and associated tools such as Helm
- Experience of global data management (DBA, monitoring, backup, etc.)
- Solid sysadmin and networks capabilities (tcp, firewall, etc)
- Exposure to incident response processes and scenarios
- Engineering skills using one or more programming languages such as Java, Python, C#, etc.
- Experience running mission-critical services in production is a major plus
- Linux / UNIX system administration.
- Use of modern CI/CD tools for delivery and automation
- Familiarity with the latest compute, load balancing and scaling, storage, networking, security and virtualization technologies
- Experience mentoring and working closely with more junior colleagues
- Fluent in French with proficient English (as you will be collaborating with the team in Birmingham)
- Experience administering and supporting PostgreSQL, Mongo, RabbitMQ, Kafka & ElasticSearch will be a plus
- Experience in and knowledge of developing and testing transactional websites, data-driven web publishing, content management systems.
- Enable continuous improvement of security process and guidelines with Security team
- Support implementation of secure design principles according to policies and standards of Information Security
Nous sommes désolés mais ce recruteur n'accepte pas les candidatures en provenance de l'étranger.