Bulge Bracket Investment Banks

Posted 13 days ago

No clicks

**Senior Lead Software Engineer - AI/ML Engineer** in London: Orchestrate AI/ML data platforms, foster resilience, mentor teams. Key duties: Develop/ support Databricks, Snowflake, AWS, Kubernetes tools; coordinate incident management; mentor and drive strategic change. Leverage Python, PySpark for AI/ML modeling. Proven SRE experience, collaborate effectively. Preferred: AWS/Databricks certifications, budget/staffing optimization. Influence market-leading solutions; advance career in global network.

Compensation: Not specified GBP
City: London
Country: United Kingdom

Full Job Description

Location: LONDON, LONDON, United Kingdom

Join us to shape the future of AI/ML data platforms, where your expertise will help create resilient and market-leading solutions. You will have the opportunity to collaborate with innovators across our global network, driving strategic change and mentoring others. We value your skills in solving complex challenges and fostering a culture of reliability and growth. At JPMorganChase, your impact will reach far beyond your team, opening doors to career advancement and meaningful relationships.

As a Site Reliability Engineer in the AI/ML Data Platforms team, you will play a key role in building scalable and resilient data solutions. You will engage in root cause analysis, production changes, and operational improvements, while supporting budgetary and staffing decisions. You will mentor team members and partner with colleagues across the organization to drive strategic change. Your contributions will help shape a collaborative, innovative, and high-performing team culture.

Job Responsibilities:

Demonstrate expertise in application development and support across technologies such as Databricks, Snowflake, AWS, and Kubernetes
Coordinate incident management coverage to ensure effective resolution of application issues
Collaborate with cross-functional teams to perform root cause analysis and implement production changes
Develop and support AI/ML solutions for troubleshooting and incident resolution
Mentor and guide team members to foster growth and drive strategic change
Build and maintain scalable, resilient, and market-leading data solutions
Support budgetary and staffing considerations to optimize team performance
Engage in operational stability and disaster recovery planning
Implement automation tools to reduce toil and improve efficiency
Ensure compliance with risk controls and company-wide standards
Build meaningful relationships across teams to achieve common goals

Required Qualifications, Capabilities, and Skills:

Proficient in site reliability culture and principles, with experience implementing site reliability within applications or platforms
Skilled in running production incident calls and managing incident resolution
Experienced in observability, including white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk
Strong understanding of SLI/SLO/SLA and Error Budgets
Proficient in Python or PySpark for AI/ML modeling
Able to reduce toil by building automation tools for repeated tasks
Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery
Awareness of risk controls and compliance with departmental and company-wide standards
Collaborative team player with the ability to build meaningful relationships

Preferred Qualifications, Capabilities, and Skills:

Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies
AWS and Databricks certifications
Advanced knowledge of AI/ML troubleshooting and incident resolution
Familiarity with budgetary and staffing optimization
Experience mentoring and guiding team members
Strong communication and interpersonal skills
Demonstrated ability to drive strategic change across teams

Drive innovation and reliability by building scalable AI/ML data solutions that empower teams and transform business outcomes.

Full Job Description

Location: LONDON, LONDON, United Kingdom

Job Responsibilities:

Demonstrate expertise in application development and support across technologies such as Databricks, Snowflake, AWS, and Kubernetes
Coordinate incident management coverage to ensure effective resolution of application issues
Collaborate with cross-functional teams to perform root cause analysis and implement production changes
Develop and support AI/ML solutions for troubleshooting and incident resolution
Mentor and guide team members to foster growth and drive strategic change
Build and maintain scalable, resilient, and market-leading data solutions
Support budgetary and staffing considerations to optimize team performance
Engage in operational stability and disaster recovery planning
Implement automation tools to reduce toil and improve efficiency
Ensure compliance with risk controls and company-wide standards
Build meaningful relationships across teams to achieve common goals

Required Qualifications, Capabilities, and Skills:

Proficient in site reliability culture and principles, with experience implementing site reliability within applications or platforms
Skilled in running production incident calls and managing incident resolution
Experienced in observability, including white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk
Strong understanding of SLI/SLO/SLA and Error Budgets
Proficient in Python or PySpark for AI/ML modeling
Able to reduce toil by building automation tools for repeated tasks
Hands-on experience in system design, resiliency, testing, operational stability, and disaster recovery
Awareness of risk controls and compliance with departmental and company-wide standards
Collaborative team player with the ability to build meaningful relationships

Preferred Qualifications, Capabilities, and Skills:

Experience in an SRE or production support role with AWS Cloud, Databricks, Snowflake, or similar technologies
AWS and Databricks certifications
Advanced knowledge of AI/ML troubleshooting and incident resolution
Familiarity with budgetary and staffing optimization
Experience mentoring and guiding team members
Strong communication and interpersonal skills
Demonstrated ability to drive strategic change across teams

Drive innovation and reliability by building scalable AI/ML data solutions that empower teams and transform business outcomes.

Senior Lead Software Engineering - AI/ML Engineer

Full Job Description

SIMILAR OPPORTUNITIES

Sr Lead Software Engineer - AI/ML

Senior Software Engineer, AI

Senior AI Engineer

Senior Software Engineer, AI Enablement

Senior AI Engineer

Senior Lead Software Engineering - AI/ML Engineer

Full Job Description

SIMILAR OPPORTUNITIES

Sr Lead Software Engineer - AI/ML

Senior Software Engineer, AI

Senior AI Engineer

Senior Software Engineer, AI Enablement

Senior AI Engineer