Headaer Background Image
Overleaf Logo

Overleaf

Senior DevOps Engineer

EuropeFull-Time$86K - $109K
Apply Now!

Please mention that you found this position on Remotedom, it helps us grow.


Overleaf is a scaleup and social enterprise that builds modern collaborative authoring tools for scientists — like Google Docs for Science. We make an online, real-time collaborative editor for papers, theses and other documents written in the LaTeX markup language.


We have over 14 million registered users from around the world, over 500,000 people use our platform each day, and we host over 100 million user-created projects. 


We’ve been recognised as one of the UK’s top 100 fastest growing businesses and included in the FEBE Growth 100 list. We were Best SaaS for Nonprofits or Education in the 2020 SaaS Awards Program, and a finalist in the Digital Leaders Impact Awards 2022.


Here are some links if you’d like to see what we have been up to recently:

We are a remote company; all staff work remotely. We meet up 2–3 times each year for valuable face-to-face time. Our core hours are 2pm–5pm UK time, during which the entire team is expected to be available for meetings. Around that, flexible working is allowed and encouraged.


Overleaf is part of Digital Science. Digital Science are advancing the research ecosystem. We are a pioneering technology company, and our vision is of a future where a trusted and collaborative research ecosystem drives progress for all. We believe in better, open, collaborative and inclusive research. In creating the next generation of tools and working in partnership with the community we tackle some of the biggest challenges to research. In order to achieve our vision, we need innovative, inspiring and dynamic people to join our team. Want to join us?


What you’ll be doing

Overleaf is a productivity tool that millions of people rely on, and our customers have high availability and performance expectations. We deploy 2–6 times daily, supported by a modern cloud stack, continuous integration and a lot of automation. We’re hosted on Google Cloud Platform with zero long-lived servers and about twenty services running mainly in Google Kubernetes Engine. We have interesting challenges around supporting write-heavy real-time collaboration workloads using websockets and running thousands of cores compiling our authors’ LaTeX documents. And we’re in large part open source (https://github.com/overleaf/overleaf).


We’re looking for an experienced Senior DevOps Engineer to join our operations team and help us level up our DevOps workflows and practices. Our next milestone is to go from CI to CI/CD (Continuous Delivery) --- fully automatic deployment to staging from a pull request, then a final manual check, then one click deployment to production. We expect this will require improvements to our current deploy automation and our monitoring and observability practices, in line with Site Reliability Engineering (SRE) best practices, so experience (and/or a desire to learn) in those areas will be helpful.


Your main activities will be:

  • Keep our systems running smoothly.
  • Make controlled changes to infrastructure. We use kustomize (with some skaffold) and terraform as our main Infrastructure as Code (IaC) tools.
  • Participate as a primary responder in our on-call rotation. We have two engineers on call at all times, a primary responder and an incident coordinator. There is additional compensation for time on call. See https://status.overleaf.com/ for incident metrics.
  • Improve and maintain runbooks and other documentation to help with knowledge transfer to other engineers, to increase the number of people who can be on-call. As a remote company, written documentation is particularly important to us.
  • Update service dependencies. We use hosted service providers wherever possible for databases, etc., to minimize toil, but some upgrades do still require coordination and testing, in collaboration with the software engineering team.
  • Review alerts and improve alerting policies.
  • Support improvements to logging, metrics and other observability tools to continue alignment with best practices.
  • Make it easy, fast and safe for developers to get code into production.
  • Improve and maintain (and create new) CI/CD pipelines. We use Google Cloud Build for CI and are looking at Google Cloud Deploy for CD.
  • Improve and maintain systems for detecting when a deploy needs to be rolled back and rolling it back.
  • Consult with software engineers during the design and implementation of code, to help the team as a whole operate what they build.
  • Participate in retrospectives and other software engineering team activities.
  • Keep our systems running securely. Overleaf, as part of Digital Science, certifies against ISO/IEC 27001:2013, and we are working toward other information security certifications.
  • Support our information security team with ongoing audits.
  • Work toward continuous improvement of our security posture.
  • Use security tools in our environment, such as firewall and IAM rules, effectively.
  • Keep cloud costs under control. We have budget alerts to be monitored and adjusted, and we periodically review for opportunities to optimize spend.
  • Stay up to date with best practices and advocate for these in the operations team and the wider engineering team.

What you’ll bring to the role

To do this job well you will:

  • Have at least 5 years’ experience in DevOps (i.e. in software development and/or operations involving DevOps tools and ways of working).
  • Have working knowledge of at least one major cloud platform, such as GCP, AWS or Azure, and be willing to learn GCP.
  • Have experience and understanding of web applications. Experience in the key technologies we use (Linux, Docker, Prometheus, node.js with JavaScript, MongoDB, Redis, PostgreSQL, Google Cloud Storage) is of course a plus, but learning these on the job is also fine.
  • Be comfortable examining the code of our services when necessary, to debug or suggest improvements.
  • Have experience with the key concepts of Site Reliability Engineering (SRE) that we use, such as SLIs and SLOs, and key SLIs, such as latency, throughput, error rates (and budgets) and saturation.
  • Have experience operating a system with some scale. We’re not yet “web scale”, but it’s not uncommon to see our systems processing tens of thousands of operations per second, and operating such systems does require care and planning.
  • Have a security-first mindset at all times, covering confidentiality, integrity and availability.
  • Be comfortable working in a fully remote team.

We expect you to:

  • Work with us full time; this is a full time role.
  • Have strong written and verbal communication in English.

Not sure you meet all qualifications? Let us decide! Research shows that women and members of other under-represented groups tend to not apply to jobs when they think they may not meet every qualification, when in fact, they often do! We are committed to creating a diverse and inclusive environment and strongly encourage you to apply.


Benefits

  • Remote and flexible working.
  • You would join a small, dedicated and growing team.
  • We’re substantially (around 80%) open source, so your work will often be on open source.
  • We’re backed by Bethnal Green Ventures (https://bethnalgreenventures.com/) and Digital Science (https://www.digital-science.com/), through which we’re part of a wider community of startups in science, health and ed-tech.
  • We’ll provide a new Mac, Windows or Linux laptop, along with a stipend for other equipment.
  • We provide a training budget; many of our staff choose to attend relevant industry conferences or buy training materials.
  • We run two biweekly internal seminar series (‘Show and Tell’ and ‘Wisdom Wednesdays’) with short talks from staff about their work or personal projects, new technologies and techniques.

Living our Values:


We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open, efficient and effective. 


The talent we secure is fundamental to us achieving our vision and our growth plans. The values we live by are:


We are brave in the pursuit of better

We are collaborative and inclusive

We are always open-minded

We are from and for the community 


We’re an equal opportunity employer. All applicants will be considered for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.


Beware of scams when applying! You should NEVER have to pay for applying for any position. Learn more about scams here.

Remotedom accepts no liability or resposability as consequence on relience upon information on here or external websites.