Bulge Bracket Investment Banks

Posted 5 days ago

No clicks

**Lead SRE - Azure & GCP:** Head our global SRE team, ensuring high SLOs on Google Cloud. Lead and implement SRE frameworks, master multiple disciplines, collaborate cross-functionally, leverage AI for incident triage, and champion DevOps. Requires Azure, Google Cloud expertise, proficiency in Python, shell scripting, or GO, and experience with cloud tools like Prometheus, Splunk, and Docker/Kubernetes.

Compensation: Not specified
City: Not specified
Country: United Kingdom

Full Job Description

Location: GLASGOW, LANARKSHIRE, United Kingdom

We have a Lead Site Reliability Engineer (SRE) opportunity within our Google Cloud Site Reliability Engineering team.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platform - Cloud Foundational Services SRE organization, you will join our Google Cloud Site Reliability Engineering team operating within a global follow-the-sun support model.

Job Responsibilities:

Lead and Implement SRE frameworks to support global google cloud environments and ensure the highest level of SLOs through operational excellence
Mastery of application, data, infrastructure, and Agentic AI disciplines
Keen understanding of financial control and budget management using expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
Uses enterprise-authorized AI capabilities within the work environment to accelerate major-incident triage, troubleshooting, and post-incident analysis, validating outputs and handling operational data according to sensitivity and security requirements.
Provide support to develop & improve the quality of technical engineering documentation
Provide technical supervision, oversight and problem resolution for engineering activities
Champion a DevOps model so that services are automated and elastic across all platforms

Required qualifications, capabilities, and skills:

Google & Azure cloud expertise in a mission critical production environment
Strong understanding about container technologies such as Docker, Kubernetes, GKE and HELM
Experience in programming in one of the following languages: Python, shell scripting or GO along with good understanding of REST APIs
Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Google Observability, Azure Monitor, Data Dog, Prometheus, Splunk, Elasticsearch and Grafana.
Demonstrated experience using enterprise-authorized AI capabilities within the work environment to improve SRE workflows (e.g., incident investigation support and knowledge capture) with strong validation habits and awareness of data sensitivity.
Ability to evaluate AI-assisted operational recommendations for correctness and risk, define appropriate guardrails for team usage, and ensure outcomes align to resiliency and security expectations.
Strong understanding about the Google Cloud governance and compliance and cost management
Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Infrastructure as Code, Terraform and Jenkins.
Google Cloud certification or equivalent technical experience in the Public Cloud.
Good understanding of Agentic AI SDKs and GitHub Copilot Skills.

Preferred qualifications, capabilities, and skills:

Good understanding of operating systems such as Windows, Linux (Redhat / Ubuntu)
Good understanding of LLM and other AI/ML frameworks which can be used in AIOPS

We have a Lead Site Reliability Engineer (SRE) opportunity within our JPMC Google & Azure Cloud Site Reliability Engineering team.

Full Job Description

Location: GLASGOW, LANARKSHIRE, United Kingdom

We have a Lead Site Reliability Engineer (SRE) opportunity within our Google Cloud Site Reliability Engineering team.

Job Responsibilities:

Lead and Implement SRE frameworks to support global google cloud environments and ensure the highest level of SLOs through operational excellence
Mastery of application, data, infrastructure, and Agentic AI disciplines
Keen understanding of financial control and budget management using expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
Uses enterprise-authorized AI capabilities within the work environment to accelerate major-incident triage, troubleshooting, and post-incident analysis, validating outputs and handling operational data according to sensitivity and security requirements.
Provide support to develop & improve the quality of technical engineering documentation
Provide technical supervision, oversight and problem resolution for engineering activities
Champion a DevOps model so that services are automated and elastic across all platforms

Required qualifications, capabilities, and skills:

Google & Azure cloud expertise in a mission critical production environment
Strong understanding about container technologies such as Docker, Kubernetes, GKE and HELM
Experience in programming in one of the following languages: Python, shell scripting or GO along with good understanding of REST APIs
Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Google Observability, Azure Monitor, Data Dog, Prometheus, Splunk, Elasticsearch and Grafana.
Demonstrated experience using enterprise-authorized AI capabilities within the work environment to improve SRE workflows (e.g., incident investigation support and knowledge capture) with strong validation habits and awareness of data sensitivity.
Ability to evaluate AI-assisted operational recommendations for correctness and risk, define appropriate guardrails for team usage, and ensure outcomes align to resiliency and security expectations.
Strong understanding about the Google Cloud governance and compliance and cost management
Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Infrastructure as Code, Terraform and Jenkins.
Google Cloud certification or equivalent technical experience in the Public Cloud.
Good understanding of Agentic AI SDKs and GitHub Copilot Skills.

Preferred qualifications, capabilities, and skills:

Good understanding of operating systems such as Windows, Linux (Redhat / Ubuntu)
Good understanding of LLM and other AI/ML frameworks which can be used in AIOPS

We have a Lead Site Reliability Engineer (SRE) opportunity within our JPMC Google & Azure Cloud Site Reliability Engineering team.

Lead SRE- Azure & GCP

Full Job Description

SIMILAR OPPORTUNITIES

Senior Lead Site Reliability Engineer

Lead Tech SRE

Tech Lead, Site Reliability Engineering

Cloud Lead Engineer

SRE Engineer

Lead SRE- Azure & GCP

Full Job Description

SIMILAR OPPORTUNITIES

Senior Lead Site Reliability Engineer

Lead Tech SRE

Tech Lead, Site Reliability Engineering

Cloud Lead Engineer

SRE Engineer