Mid / Senior Site Reliability Engineer
Convertedin
- Maadi, Cairo
- Permanent
- Full-time
- Provide primary operational support for multiple large-scale distributed software applications
- Participate in system design consulting, platform management, and capacity planning
- Partner with engineering teams to improve services through rigorous testing and release procedures
- Run the production environment by monitoring availability and taking an in-depth view of system health
- Build monitoring that alerts on symptoms/issues rather than on outages
- Debug production issues across services and levels of the hosting stack
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Design, build, plan, and maintain core infrastructure that enables ConvertedIn scaling to support thousands of concurrent users
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault disclosure
- Create sustainable systems and services through adaptation of automation
- Balance feature development speed and reliability with well-defined service-level objectives
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual enhancement
- Be on an on-call rotation to respond to incidents that impact ConvertedIn services availability
- Document all the things so you don’t need to learn the same thing twice
- Deliver root cause analysis and corrective actions for reported incidents
- 2-5 years of experience as any of the following, Site Reliability Engineer, DevOps Engineer, System Engineer, Infrastructure Engineer, Cloud Support Engineer
- Good command of English language
- Configuration management: Ansible to manage our infrastructure hosting
- Infrastructure as code: Use Terraform and GitHub actions for automation, and leverage cloud technologies to meet our goals
- Systems: manage, configure and troubleshoot operating system issues, storage (block and object), networking (VPCs, proxies and CDNs), and operating system (Linux) configuration, package management, startup and troubleshooting
- Cloud services: Cloud resources provisioning and configuration through CLI/API
- Solid understanding of the software development lifecycle
- Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and third-parties integrations like (slack, ,datadog, sentry, pagerduty)
- Strong Knowledge of managing CI/CD tools
- Engineering practices: availability, reliability and scalability, as well as disaster recovery
- Work in a variety of languages: Shell, Python
- Work from Anywhere, yes this remote base opportunity with the availability of hybrid options
- We love flexibility! Manage your own work schedule. We trust the people we hire
- Competitive package
- Paid time off
- Fun benefits & perks that you will enjoy!
- Trusting, ego-free and truth-seeking team members
- See something you want to improve? Awesome. We’re a flexible and collaborative team that is always learning and growing
- Confidence can sometimes hold us back from applying for a job. But we'll let you in on a secret: there's no such thing as a 'perfect' candidate. Have 50% of the criteria? Excited about this opportunity? Passionate about what we do at Convertedin Please apply! Convertedin is where everyone can grow
- Convertedin is an equal employment opportunity employer such that all qualified applicants will receive consideration for employment without regard to race, color, age, religion, sex, sexual orientation, gender identity/expression, national origin or disability