Bulge Bracket Investment Banks

Posted 7 days ago

No clicks

**Lead Site Reliability Engineer - Network** Join a global leader in Palo Alto, CA as our Lead Site Reliability Engineer, shaping the future of network reliability. You'll lead your team, advising across multiple domains, conducting resiliency reviews, and mentoring engineers. Key responsibilities include applying network reliability principles, partnering with business domains, driving best practices, and providing tier-3 support. Leverage your 10+ years of experience and proficiency in network engineering, cloud platforms, and major network technologies like Palo Alto, Juniper, and Cisco. Exercise your influence, foster innovation, and champion change for success.

Compensation: Not specified USD
City: Palo Alto
Country: United States

Full Job Description

Location: Palo Alto, CA, United States

Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Network Product, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.

Job responsibilities

Applies network reliability principles (Permit to Operate, FMEA, operational readiness), balancing feature delivery, efficiency, and stability.
Partners with network engineering domains (Datacenter, Firewall, Proxies, DMZ, Load Balancing, etc.) and Lines of Business to align goals and outcomes.
Drives adoption of reliability best practices and observability, demonstrating impact through stability/reliability metrics.
Bridges Engineering, Operations, DevOps, and customers to build resilient, scalable, and secure network services.
Provides Tier-3 network support, leading major incident response, rapid restoration, RCA, and follow-through on corrective actions.
Leads reliability and stability initiatives using data-driven analysis to improve service levels and reduce recurring failure modes.
Defines SLI/SLOs and error budgets with stakeholders and customers, ensuring measurable performance targets and trade-off clarity.
Identifies and removes technical bottlenecks within core domains of expertise, proactively preventing reliability and capacity risks.
Runs blameless, data-driven post-mortems and debriefs, converting learnings (successes and failures) into actionable improvements.
Fosters continuous improvement and strong knowledge sharing, soliciting real-time feedback, avoiding duplicated work, and promoting innovation via internal communities.
Produces and packages thought leadership with specialists/product/engineering teamsdocumenting best practices and lessons learned for internal assets and industry forums/conferences.

Required qualifications, capabilities, and skills

Formal training or certification in network engineering concepts and 5+ years of applied experience.
10+ years of experience leading technologists to manage and solve complex technical items within your domain of expertise.
Advanced proficiency in network reliability engineering, including Permit to Operate, FMEA, and operational readiness processes.
Experience leading technologists to manage and solve complex network issues at a firmwide level.
Ability to influence team culture by championing innovation and change for success.
Proficiency in SD-WAN, cloud platforms (AWS, Azure, etc.), and major network technologies (Palo Alto, Juniper, F5, Broadcom, Arista, Cisco, etc.).
Proficiency in observability and monitoring tools such as Grafana, SevOne, Prometheus, Kibana, ThousandEyes, and Splunk.

Preferred qualifications, capabilities, and skills

CCIE, Load-balancing, SD-WAN, Observability tools, eBPF, Cloud certs
Demonstrated proficiency in troubleshooting and supporting complex networking environments, including Tier-3 operational support for major incidents.
Experience with continuous integration and delivery tools (e.g., Jenkins, GitLab, Terraform, etc.).
Experience in scalable networking design, including high availability, redundancy, failover, and load balancing.
Experience troubleshooting networking protocols such as TCP/IP, HTTPS, and BGP.
Experience in customer-facing migration, including service discovery, assessment, planning, execution, and operations.

This position is subject to Section 19 of the Federal Deposit Insurance Act. As such, an employment offer for this position is contingent on JPMorganChases review of criminal conviction history, including pretrial diversions or program entries.

Lead SRE for Network Product, driving reliability, security, and automation with a data-driven focus

Full Job Description

Location: Palo Alto, CA, United States

Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.

Job responsibilities

Applies network reliability principles (Permit to Operate, FMEA, operational readiness), balancing feature delivery, efficiency, and stability.
Partners with network engineering domains (Datacenter, Firewall, Proxies, DMZ, Load Balancing, etc.) and Lines of Business to align goals and outcomes.
Drives adoption of reliability best practices and observability, demonstrating impact through stability/reliability metrics.
Bridges Engineering, Operations, DevOps, and customers to build resilient, scalable, and secure network services.
Provides Tier-3 network support, leading major incident response, rapid restoration, RCA, and follow-through on corrective actions.
Leads reliability and stability initiatives using data-driven analysis to improve service levels and reduce recurring failure modes.
Defines SLI/SLOs and error budgets with stakeholders and customers, ensuring measurable performance targets and trade-off clarity.
Identifies and removes technical bottlenecks within core domains of expertise, proactively preventing reliability and capacity risks.
Runs blameless, data-driven post-mortems and debriefs, converting learnings (successes and failures) into actionable improvements.
Fosters continuous improvement and strong knowledge sharing, soliciting real-time feedback, avoiding duplicated work, and promoting innovation via internal communities.
Produces and packages thought leadership with specialists/product/engineering teamsdocumenting best practices and lessons learned for internal assets and industry forums/conferences.

Required qualifications, capabilities, and skills

Formal training or certification in network engineering concepts and 5+ years of applied experience.
10+ years of experience leading technologists to manage and solve complex technical items within your domain of expertise.
Advanced proficiency in network reliability engineering, including Permit to Operate, FMEA, and operational readiness processes.
Experience leading technologists to manage and solve complex network issues at a firmwide level.
Ability to influence team culture by championing innovation and change for success.
Proficiency in SD-WAN, cloud platforms (AWS, Azure, etc.), and major network technologies (Palo Alto, Juniper, F5, Broadcom, Arista, Cisco, etc.).
Proficiency in observability and monitoring tools such as Grafana, SevOne, Prometheus, Kibana, ThousandEyes, and Splunk.

Preferred qualifications, capabilities, and skills

CCIE, Load-balancing, SD-WAN, Observability tools, eBPF, Cloud certs
Demonstrated proficiency in troubleshooting and supporting complex networking environments, including Tier-3 operational support for major incidents.
Experience with continuous integration and delivery tools (e.g., Jenkins, GitLab, Terraform, etc.).
Experience in scalable networking design, including high availability, redundancy, failover, and load balancing.
Experience troubleshooting networking protocols such as TCP/IP, HTTPS, and BGP.
Experience in customer-facing migration, including service discovery, assessment, planning, execution, and operations.

Lead SRE for Network Product, driving reliability, security, and automation with a data-driven focus

Lead Site Reliability Engineering - Network

Full Job Description

SIMILAR OPPORTUNITIES