Senior Site Reliability Engineer
Job Description
At Xenon Seven, we are at the forefront of technological innovation, collaborating with both dynamic startups and established enterprises. Our mission is to tackle some of the most complex challenges in the IT landscape, particularly in the realms of data management, web services, infrastructure, and artificial intelligence. As we partner with a leading financial institution in Egypt, renowned for its extensive banking services and commitment to a digital transformation journey, we are looking for a Senior Site Reliability Engineer. This position is pivotal in designing and maintaining a robust, scalable, and secure infrastructure for critical banking applications. You will lead SRE initiatives, mentor junior engineers, and spearhead our observability strategy utilizing cutting-edge technologies such as OpenShift, Kubernetes, Prometheus, Grafana, and the ELK Stack.
Key Responsibilities
The Senior Site Reliability Engineer will oversee the design and implementation of scalable infrastructure, ensuring high availability and security for banking applications. You will mentor junior engineers and lead initiatives that enhance production support and observability.
- Design and architect highly available OpenShift/Kubernetes infrastructure for banking applications
- Implement comprehensive monitoring and observability strategies using Prometheus and Grafana
- Oversee centralized logging infrastructure with the ELK Stack
- Lead the adoption of SRE best practices and production support standards
- Mentor junior engineers on OpenShift, Kubernetes, and monitoring
- Define and implement Service Level Indicators, Objectives, and Agreements
- Lead incident response strategies and post-incident reviews
- Architect advanced monitoring dashboards and alerting systems
- Design automation frameworks to improve operational efficiency
- Manage on-premise data center resources and plan infrastructure
- Participate in on-call rotation for critical production incidents
- Ensure compliance and security hardening for financial systems
Required Technical Skills
Soft Skills
Qualifications
- Bachelor's degree in Computer Science, Information Technology, Software Engineering, or a related field
- 5+ years of relevant experience in Site Reliability Engineering, DevOps, or Production Engineering
- 3+ years of experience in a leadership role within SRE teams or managing production support operations
- Deep understanding of OpenShift and Kubernetes for on-premise infrastructure management
- Expertise in monitoring solutions such as Prometheus and Grafana
- Advanced knowledge of ELK Stack for logging and analysis
- Solid experience in Linux/Unix system administration and container networking
Language Requirements
Programming Languages:
OpenShift, Kubernetes, Prometheus, Grafana, ELK Stack
Spoken Languages:
English, German, French
Benefits & Perks
- ✓ Competitive salary
- ✓ Professional development opportunities
- ✓ Flexible working hours
- ✓ Health and wellness programs
- ✓ Collaborative work environment
- ✓ Access to cutting-edge technologies
Working Conditions
Full Time
Company Culture
We foster a culture of innovation, collaboration, and continuous improvement where employees are encouraged to think creatively and push the boundaries of technology. Our team values diversity and open communication, emphasizing professional growth and support.
Salary Range
Project Types: Not Available
Career Growth: Leadership roles within SRE or DevOps teams, Opportunities to lead complex projects, Professional certifications and training, Career development programs