
at J.P. Morgan
Bulge Bracket Investment BanksPosted 5 days ago
No clicks
**Lead SRE - Azure & GCP:** Head our global SRE team, ensuring high SLOs on Google Cloud. Lead and implement SRE frameworks, master multiple disciplines, collaborate cross-functionally, leverage AI for incident triage, and champion DevOps. Requires Azure, Google Cloud expertise, proficiency in Python, shell scripting, or GO, and experience with cloud tools like Prometheus, Splunk, and Docker/Kubernetes.
- Compensation
- Not specified
- City
- Not specified
- Country
- United Kingdom
Currency: Not specified
Full Job Description
Location: GLASGOW, LANARKSHIRE, United Kingdom
We have a Lead Site Reliability Engineer (SRE) opportunity within our Google Cloud Site Reliability Engineering team.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platform - Cloud Foundational Services SRE organization, you will join our Google Cloud Site Reliability Engineering team operating within a global follow-the-sun support model.
Job Responsibilities:
- Lead and Implement SRE frameworks to support global google cloud environments and ensure the highest level of SLOs through operational excellence
- Mastery of application, data, infrastructure, and Agentic AI disciplines
- Keen understanding of financial control and budget management using expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
- Uses enterprise-authorized AI capabilities within the work environment to accelerate major-incident triage, troubleshooting, and post-incident analysis, validating outputs and handling operational data according to sensitivity and security requirements.
- Provide support to develop & improve the quality of technical engineering documentation
- Provide technical supervision, oversight and problem resolution for engineering activities
- Champion a DevOps model so that services are automated and elastic across all platforms
Required qualifications, capabilities, and skills:
- Google & Azure cloud expertise in a mission critical production environment
- Strong understanding about container technologies such as Docker, Kubernetes, GKE and HELM
- Experience in programming in one of the following languages: Python, shell scripting or GO along with good understanding of REST APIs
- Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Google Observability, Azure Monitor, Data Dog, Prometheus, Splunk, Elasticsearch and Grafana.
- Demonstrated experience using enterprise-authorized AI capabilities within the work environment to improve SRE workflows (e.g., incident investigation support and knowledge capture) with strong validation habits and awareness of data sensitivity.
- Ability to evaluate AI-assisted operational recommendations for correctness and risk, define appropriate guardrails for team usage, and ensure outcomes align to resiliency and security expectations.
- Strong understanding about the Google Cloud governance and compliance and cost management
- Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Infrastructure as Code, Terraform and Jenkins.
- Google Cloud certification or equivalent technical experience in the Public Cloud.
- Good understanding of Agentic AI SDKs and GitHub Copilot Skills.
Preferred qualifications, capabilities, and skills:
- Good understanding of operating systems such as Windows, Linux (Redhat / Ubuntu)
- Good understanding of LLM and other AI/ML frameworks which can be used in AIOPS
We have a Lead Site Reliability Engineer (SRE) opportunity within our JPMC Google & Azure Cloud Site Reliability Engineering team.




