Elastic · United States

Site Reliability Engineer (Hosted Infra) - Platform

🏢 Elastic📍 United States🕐 Posted 5 days ago
⏱ Full-timePlatform - Cross Team✅ Direct from employer ATS
Apply on Elastic
ℹ️ Please note: This listing is sourced from a third-party job board. Jobnique is a job search platform and is not the employer for this role. The hiring company is Elastic.

About this role

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI

What is the role We are Cloud Infrastructure SREs that integrate, scale, and evolve multi-cloud infrastructure across 4 Cloud Service Providers, 70+ globally distributed regions, and tens of thousands of hosts to power Elastic Cloud. We tackle hard problems at scale through automation, Infrastructure as Code (IaC), configuration management, and purpose-built software that eliminates toil and improves reliability

We're also a team that grows people as well as systems. If that challenge genuinely excites you, we'd love to hear from you

What you will be doing Engineering software to automate large-scale systems — building internal tools and services, not just running scripts

Optimizing the reliability and lifecycle of hosts across multiple cloud providers

Strengthening our observability posture — crafting alerting and monitoring systems that drive incident prevention over incident response

Scaling global infrastructure and evolving the infrastructure management processes to meet growing demand

Contributing to code reviews, sharing your work, planning what we need to do next, and both mentoring and being mentored by teammates

Being part of a balanced SRE on-call rotation: responding to incidents, improving runbooks, participating in postmortems, and championing reliability improvements

What you bring Experience building software with Golang. You are also comfortable reviewing others' code and offering constructive feedback

Production experience operating large-scale cloud compute (hundreds of hosts or more) via automated workflows

Deep experience with Linux systems — you are at home in the terminal debugging at the OS level

Proficiency working with containerized workloads in production

A customer-first, systems-thinking approach to operational problems — you care about root causes, not just symptoms

Comfortable working across time zones in both real-time and asynchronous contexts

You contribute clear and maintainable documentation such as software designs, runbooks, architecture diagrams/decisions, postmortems, etc..

You communicate project status regu

Apply on Elastic