LOG IN
SIGN UP
Canary Wharfian - Online Investment Banking & Finance Community.
Sign In
or continue with e-mail and password
Forgot password?
Don't have an account?
Create an account
or continue with e-mail and password
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Site Reliability Engineer - Vice President

ExperiencedNo visa sponsorship
Citi logo

at Citi

Bulge Bracket Investment Banks

Posted 6 days ago

No clicks

**Site Reliability Engineer - Vice President** (Pune, Hybrid) Responsible for operational resilience & implementation of SRE principles in a complex environment. Key duties include fostering a culture of innovation, ensuring critical applications meet resilience requirements, and driving observability & performance enhancements. Requirements include a minimum of 13 years of SRE experience, hands-on expertise with tools like Prometheus, Grafana, and Terraform. Proficiency in OpenShift/Kubernetes, experience with disaster recovery planning, and familiarity with major cloud providers are essential. This senior role demands strong communication skills, strategic thinking, and the ability to work effectively across multiple teams. A service-oriented attitude and problem-solving mindset are crucial.

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
India

Full Job Description

Site Reliability Engineer - Vice President

Apply (opens in new window)
Save
Job Req Id:
26956402
Location(s):
Pune, Maharashtra, India
Job Type:
Hybrid
Posted:
Mai. 06, 2026

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, youll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

The Site Reliability Engineer (SRE) is a strategic professional accountable for the daily operations, architectural resilience, and overall implementation of SRE principles in a complex, critical, and largescale multi-disciplinary environment. This role requires a comprehensive understanding of multiple technology domains and their interaction to achieve business objectives. As a recognized technical authority, you will apply an in depth understanding of the business impact of technical contributions and provide advice and counsel on strategic solutions.

We are seeking a passionate and experienced SRE to join our Production Management team. In this role, you will be instrumental in enhancing the reliability, performance, and efficiency of our Applications and Services. You will drive our strategy for end-to-end observability and resiliency, collaborating across the organization to ensure our services are stable, scalable, and fault tolerant. This is a key role that will influence strategic decisions and foster a culture of technical excellence and accountability.

Key Responsibilities

Culture & Strategy

Foster a culture of transparency, innovation, and accountability that encourages continuous improvement.

Communicate the progress and impact of SRE initiatives to stakeholders at all levels.

Operate effectively within a highly regulated environment, ensuring compliance with all relevant requirements.

Resiliency & Recovery

Ensure critical business applications meet stringent operational resilience requirements, including adherence to defined impact tolerances.

Oversee advanced recovery testing, including Production Swing Tests, Data Recovery Tests, and chaos engineering practices.

Drive the adoption and development of automation, such as One Touch Recovery solutions, to minimize recovery time.

Partner with development teams to leverage cloud native services and established resiliency patterns to enhance application reliability.

Observability & Performance

Collaborate across the organization to develop and scale observability solutions using modern tools for metrics, logging, and tracing.

Partner with development teams to effectively instrument applications, providing deep insights into system health and performance.

Essential Skills

13 + years of deep understanding of SRE concepts, including SLOs, SLIs, error budgets, and toil reduction.

Demonstrable experience with Disaster Recovery planning, resiliency testing, and fault tolerant distributed system design.

Proficiency in deploying, managing, and troubleshooting applications on OpenShift/Kubernetes.

Hands on experience with modern observability tools (e.g., Prometheus, Grafana, Loki, Mimir, Tempo, AppDynamics).

Experience with Infrastructure as Code (IaC), configuration management, and automation tools (e.g., Ansible, Terraform).

Experience creating, modifying, and managing Helm charts for application deployment.

Desired Skills

Experience with major public cloud providers (e.g., Google Cloud, AWS, Azure).

Proven experience delivering software and infrastructure using Agile frameworks.

Experience presenting technical strategy to senior and executive level audiences.

Experience writing or maintaining code in Java, Python, Go, or similar languages.

Qualifications

Significant professional experience in production management, software development, or an equivalent field, with a strong focus on Site Reliability Engineering.

Expertise in analyzing complex application, database, network, and OS issues within large scale, customer facing systems.

A service-oriented attitude combined with excellent problem-solving and strategic thinking skills.

Strong communication and diplomacy skills, with a proven ability to work effectively across multiple business and technical teams.

------------------------------------------------------

Job Family Group:

Technology

------------------------------------------------------

Job Family:

Applications Support

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi (opens in new window).

View Citis EEO Policy Statement (opens in new window) and the Know Your Rights (opens in new window) poster.

Apply (opens in new window)
Save

Site Reliability Engineer - Vice President

Compensation

Not specified

City: Not specified

Country: India

Citi logo
Bulge Bracket Investment Banks

6 days ago

No clicks

at Citi

ExperiencedNo visa sponsorship

**Site Reliability Engineer - Vice President** (Pune, Hybrid) Responsible for operational resilience & implementation of SRE principles in a complex environment. Key duties include fostering a culture of innovation, ensuring critical applications meet resilience requirements, and driving observability & performance enhancements. Requirements include a minimum of 13 years of SRE experience, hands-on expertise with tools like Prometheus, Grafana, and Terraform. Proficiency in OpenShift/Kubernetes, experience with disaster recovery planning, and familiarity with major cloud providers are essential. This senior role demands strong communication skills, strategic thinking, and the ability to work effectively across multiple teams. A service-oriented attitude and problem-solving mindset are crucial.

Full Job Description

Site Reliability Engineer - Vice President

Apply (opens in new window)
Save
Job Req Id:
26956402
Location(s):
Pune, Maharashtra, India
Job Type:
Hybrid
Posted:
Mai. 06, 2026

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, youll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

The Site Reliability Engineer (SRE) is a strategic professional accountable for the daily operations, architectural resilience, and overall implementation of SRE principles in a complex, critical, and largescale multi-disciplinary environment. This role requires a comprehensive understanding of multiple technology domains and their interaction to achieve business objectives. As a recognized technical authority, you will apply an in depth understanding of the business impact of technical contributions and provide advice and counsel on strategic solutions.

We are seeking a passionate and experienced SRE to join our Production Management team. In this role, you will be instrumental in enhancing the reliability, performance, and efficiency of our Applications and Services. You will drive our strategy for end-to-end observability and resiliency, collaborating across the organization to ensure our services are stable, scalable, and fault tolerant. This is a key role that will influence strategic decisions and foster a culture of technical excellence and accountability.

Key Responsibilities

Culture & Strategy

Foster a culture of transparency, innovation, and accountability that encourages continuous improvement.

Communicate the progress and impact of SRE initiatives to stakeholders at all levels.

Operate effectively within a highly regulated environment, ensuring compliance with all relevant requirements.

Resiliency & Recovery

Ensure critical business applications meet stringent operational resilience requirements, including adherence to defined impact tolerances.

Oversee advanced recovery testing, including Production Swing Tests, Data Recovery Tests, and chaos engineering practices.

Drive the adoption and development of automation, such as One Touch Recovery solutions, to minimize recovery time.

Partner with development teams to leverage cloud native services and established resiliency patterns to enhance application reliability.

Observability & Performance

Collaborate across the organization to develop and scale observability solutions using modern tools for metrics, logging, and tracing.

Partner with development teams to effectively instrument applications, providing deep insights into system health and performance.

Essential Skills

13 + years of deep understanding of SRE concepts, including SLOs, SLIs, error budgets, and toil reduction.

Demonstrable experience with Disaster Recovery planning, resiliency testing, and fault tolerant distributed system design.

Proficiency in deploying, managing, and troubleshooting applications on OpenShift/Kubernetes.

Hands on experience with modern observability tools (e.g., Prometheus, Grafana, Loki, Mimir, Tempo, AppDynamics).

Experience with Infrastructure as Code (IaC), configuration management, and automation tools (e.g., Ansible, Terraform).

Experience creating, modifying, and managing Helm charts for application deployment.

Desired Skills

Experience with major public cloud providers (e.g., Google Cloud, AWS, Azure).

Proven experience delivering software and infrastructure using Agile frameworks.

Experience presenting technical strategy to senior and executive level audiences.

Experience writing or maintaining code in Java, Python, Go, or similar languages.

Qualifications

Significant professional experience in production management, software development, or an equivalent field, with a strong focus on Site Reliability Engineering.

Expertise in analyzing complex application, database, network, and OS issues within large scale, customer facing systems.

A service-oriented attitude combined with excellent problem-solving and strategic thinking skills.

Strong communication and diplomacy skills, with a proven ability to work effectively across multiple business and technical teams.

------------------------------------------------------

Job Family Group:

Technology

------------------------------------------------------

Job Family:

Applications Support

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi (opens in new window).

View Citis EEO Policy Statement (opens in new window) and the Know Your Rights (opens in new window) poster.

Apply (opens in new window)
Save