Responsibilities
● Co-design, implement, and manage applications and services for hybrid virtualization and containerized platforms based on OpenStack and/or Red Hat OpenShift ensuring platform stability, performance and compliance with industry standards and best practices.
● Collaborate with architecture and engineering teams on technology stack component evaluation and selection ensuring solutions are designed following best practices and optimized from both functional and non-functional perspectives.
● Develop and implement plans to enhance the reliability of the applications and services infrastructure, addressing potential points of failure and ensuring high availability of services.
● Collaborate with relevant teams to conduct regular performance assessments and implement improvements based on findings.
● Prepare and participate in complex changes to production environments supporting operational teams.
● Develop auto-test and automation solutions for cloud platform using tools like Jenkins and Selenium along with other configuration management tools such as Terraform, Ansible, Puppet, Chef, and GitLab CI/CD.
● Provide L3 expert support including on-call shifts with focus on immediate incident management and resolutions, such as outages, breaches, and system failures.
● Write and maintain relevant documentation ensuring completeness and quality.
● Prepare and provide trainings for operational teams in the related technical domains.
● Collaborate with security management teams to ensure that systems are safe and secure against cybersecurity threats.
● Work closely with process management and operational teams and contribute to process development standardizing collaboration framework and improving collaboration efficiency.
Qualification, Experience, Competence and Certifications
● Bachelor’s or Master’s degree in Computer Science, Engineering, Software Engineering, or other relevant technology field.
● 7+ years of hands-on experience in Linux Environments and 5+ years of experience in Senior Systems engineering role.
● Experience in designing, deploying, and managing Kubernetes and/or OpenShift clusters with a deep understanding of Kubernetes architecture and ecosystem.
● Experience in managing large-scale public or private cloud environments and/or work in a cloud service provider environment with exposure to high-load systems is highly desirable.
● Experience working with virtualization technologies like Openstack and/or VMWare, computing technologies such as x86 hardware, OS, KVM/ESXi and orchestration services is highly desirable.
● In-depth knowledge of frontend, application and middleware technologies such as Web ( Apache, Nginx), Kafka, RabbitMQ. Experience with deploying and managing scalable solutions that support enterprise-level applications.
● Proficient in identity and access management (IAM) protocols such as SAML, OAuth 2.0, and OpenID Connect, with experience integrating these protocols for secure single sign-on (SSO) implementations across a variety of platforms and services.
● Understanding of CI/CD principles, Infrastructure as Code (IaaC) approach and software defined infrastructure solutions and posesing demonstrable experience with infrastructure as code (IaC) using tools such as Terraform, Ansible.
● Proficiency in scripting languages such as Python or Bash for automation of system tasks.
● Solid understanding of security practices and tools, including experience with security scanning tools and implementation of security best practices.
● Experience with database management and optimization for both SQL and NoSQL databases such as MySQL, PostgreSQL, MongoDB, or Cassandra.
● Knowledge of monitoring and observability tools like Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), or Splunk.
● Ability to design and implement disaster recovery and high availability strategies for critical application services.
● Practical knowledge of network protocols (TCP/IP, HTTP, SSL/TLS) and network security measures.
● Familiarity with data platform technologies for big data processing such as Hadoop, Spark, or data warehousing solutions is highly desirable.
● Strong project management skills, with experience using agile methodologies and tools such as JIRA.
● Knowledge and experience working with GPU-hardware and AI hardware accelerators is a plus.
● Relevant certifications are highly desirable.
● Strong organizational skills with the ability to multitask and prioritize.
● A proactive approach to problem-solving and decision-making.