Headaer Background Image
Infinite Reality Logo

Infinite Reality

Principal SRE/DevOps Engineer

USAFull-Time
Apply Now!

Please mention that you found this position on Remotedom, it helps us grow.


As a Principal, you’ll lead our SRE and DevOps teams, driving the design, automation, and reliability of our infrastructure. Your expertise will be crucial in ensuring our systems are secured, robust, scalable, and efficient.


  • Technical Leadership: Provide technical leadership to the SRE and DevOps teams, guiding them in designing, implementing, and maintaining highly available and scalable infrastructure.
  • Exceptional Problem Solving: Candidates must be able to demonstrate the ability to start from an unknown, design, and implement novel solutions. A primary responsibility will be contributing to org wide R&D efforts by prototyping next generation infrastructure solutions in coordination with engineering teams. A mastery of Python and Bash is a must, knowledge of other programming/scripting languages a bonus.
  • System Architecture: Collaborate with software engineers and architects to design, develop, and maintain infrastructure solutions that support our applications and services.
  • Automation: Champion automation across the development and operations lifecycle to increase efficiency, reduce manual work, and minimize downtime.
  • Reliability and Performance: Develop strategies and implement practices to maintain high system reliability, availability, and performance, while continuously monitoring and improving system health.
  • Security: Prioritize DevSecOps methodology and align with InfoSec on joint initiatives. Ensuring that infrastructure and applications meet security and compliance standards.
  • Incident Management: Lead incident response and post-mortem activities, with InfoSec leadership, to identify root causes, implement preventive measures, and improve system resilience.
  • SSDLC: Design, implement, and monitor Secure Software Development Life Cycle process in alignment with InfoSec.
  • Infrastructure as Code (IaC): Manage and expand the use of Infrastructure as Code tools and practices to ensure reproducibility and consistency in our environments.
  • Scalability: Work on scaling infrastructure horizontally and vertically to support growing demands while optimizing costs.
  • Monitoring and Alerting: Implement and maintain robust monitoring, alerting, and observability solutions to proactively identify and address issues.
  • Cost Optimization: Continuously optimize cloud resource usage to maximize efficiency and reduce operational costs.
  • Documentation: Create and maintain comprehensive documentation for infrastructure, processes, and best practices.
  • Mentorship: Provide mentorship and technical guidance to junior engineers, fostering their professional growth.



Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent work experience.
  • 7+ years of proven experience as a Site Reliability Engineer, DevOps Engineer, or a similar role in a senior or leadership capacity.
  • 7+ years of strong expertise in cloud platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes).
  • 5+ years of experience in implementing and maintaining robust monitoring, alerting, and observability solutions to proactively identify and address issues.
  • 5+ years of experience SIEM
  • Proficiency in scripting and programming languages (e.g., Python, Bash, Go, Ruby).
  • Deep knowledge of Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
  • Deep knowledge of development languages such as Node.js, Java, PHP, and Go
  • 7+ years of experience with CI/CD pipelines and automation tools (e.g., Jenkins, CircleCI, Github, GitLab CI/CD).
  • Strong understanding of DevSecOps and InfoSec methodologies.
  • Strong understanding of networking and data storage technologies.
  • Strong database technology knowledge and type of data structure.
  • Excellent problem-solving skills and the ability to troubleshoot complex issues.
  • Strong communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams.
  • Certifications such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, or Kubernetes certifications are a plus.
  • Experience with Game Engine platforms such as Unity and Unreal and their infrastructure needs is a plus.



Beware of scams when applying! You should NEVER have to pay for applying for any position. Learn more about scams here.

Remotedom accepts no liability or resposability as consequence on relience upon information on here or external websites.