
at J.P. Morgan
Bulge Bracket Investment BanksPosted 8 days ago
1 click
**Lead Site Reliability Engineer: Bengaluru, India** - Drive site reliability as technical lead, influencing team culture and practices. - Improve app reliability via data-driven analytics and service level indicators. - Handle major incidents, solve technical issues swiftly to prevent financial losses. - Bring 6+ years in tech support, production application support, or infrastructure management. - Demonstrate expertise in reliability, scalability, performance, and security. - Proficiency in observability tools (Grafana, Dynatrace, etc.) and CI/CD (Jenkins, GitLab). - Collaborate across teams and drive improvements in complex environments. - Familiarity with ITIL, networking concepts, IaC, and major cloud platforms preferred.
- Compensation
- Not specified
- City
- Bengaluru
- Country
- India
Currency: Not specified
Full Job Description
Location: Bengaluru, Karnataka, India
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
Job responsibilities
- Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team
- Leads initiatives to improve the reliability and stability of your teams applications and platforms using data-driven analytics to improve service levels
- Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
- Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
- Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses
- Documents and shares knowledge within your organization via internal forums and communities of practice
- Collaborates with Software Engineering, Product, and other stakeholder teams to drive issue resolution, operational stability, and performance improvements.
- Monitor production environments for anomalies, leveraging observability tools and dashboards to detect and address issues promptly.
Required qualifications, capabilities, and skills
- 6+ years of experience in technology support, production/application support, DevOps, or infrastructure management.
- Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
- Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, .Net, etc.)
- Experience supporting public/private cloud-based applications.
- Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)Strong collaboration skills and the ability to build effective relationships across teams.
- Ability to identify and solve problems related to complex data structures and algorithms
- Drive to self-educate and evaluate new technology
- Ability to teach new programming languages to team members
- Ability to expand and collaborate across different levels and stakeholder groups
- Familiarity with ITIL support methodologies and concepts.
- Solid understanding of networking concepts and troubleshooting.
- Familiarity with Infrastructure as Code (IaC) tools and major cloud platforms (AWS, Azure, or GCP).




