Other

Posted 3 days ago

No clicks

**Senior Specialist - Site Reliability Engineering** As a Senior Specialist Site Reliability Engineer (SRE), you'll drive reliability strategy for critical cloud services, define Service Level Objectives (SLOs) and error budgets, and lead complex problem-solving. Key responsibilities include designing observability standards using tools like Datadog and AWS CloudWatch, leading major incident responses, and mentoring engineers. Bring significant SRE/DevOps experience, strong AWS knowledge, and the ability to automate and script. Support a 24x7 on-call rotation and help continuously improve operational readiness. Join a global financial markets infrastructure provider driving excellence and change.

Compensation: Not specified
City: Bengaluru
Country: India

Full Job Description

Senior Specialist Site Reliability Engineer, Risk Intelligence

Role summary
As a Senior Specialist Site Reliability Engineer (SRE), youll provide deep reliability and operational expertise for business-critical cloud services. Youll help set the reliability strategy (for example, defining SLOs and error budgets), design the observability and automation that keep services balanced, and lead improvements that reduce operational risk and customer impact. This role suits people who enjoy solving complex production problems and raising the standard for engineering excellenceregardless of whether your background is SRE, DevOps, Platform Engineering, Systems Engineering, or Software Engineering with strong operations experience.

On-call: Youll take part in a 24x7 team on-call rotation. As a senior specialist, youll help improve how we run on-call by reducing noisy alerts, automating repetitive tasks, and strengthening operational readiness so incidents become less frequent and easier to resolve. From time to time, this may include planned out-of-hours maintenance to support a highly available service. We aim to share the load fairly and continuously improve the on-call experience.

Key responsibilities

Own reliability outcomes for critical user journeys and cloud components by defining SLIs/SLOs, setting error budgets, and using data to prioritise resilience work.
Design observability standards (metrics, logs, traces, dashboards, alerting) using tools such as Datadog and AWS CloudWatch so teams can detect issues early and diagnose them quickly.
Lead major incident response as incident commander when needed, drive blameless post-incident reviews, and ensure corrective actions are implemented and measured.
Improve platform resilience and scalability through reliability architecture reviews, capacity and performance engineering, and proactive risk assessments.
Reduce toil by automating repetitive operational work (for example with Terraform, CI/CD, and scripting), and track improvements through clear operational metrics.
Partner with Product, Engineering, Customer Support, Security, and leadership to balance reliability, feature delivery, compliance obligations, and customer impact.
Create and maintain operational documentation, runbooks, and change procedures, and raise standards through reviews, templates, and shared practices.
Coach and mentor engineers, helping teams adopt SRE ways of working and strengthening operational ownership.

What youll bring

Significant experience in SRE/DevOps/Platform Engineering (or equivalent practical experience) supporting production services and making reliability improvements that stick.
Demonstrated ownership of reliability practices such as SLOs, error budgets, incident management, post-incident actions, and operational readiness.
Strong AWS knowledge and hands-on experience operating cloud services (for example: EKS, Lambda, SQS, API Gateway, DocumentDB, RDS/MySQL, and CloudWatch/CloudWatch Insights).
Experience with infrastructure-as-code and automation (for example, Terraform) and with CI/CD pipelines (for example, GitLab CI/CD).
Experience with Kubernetes, including production operations on managed platforms such as Amazon EKS.
Ability to script to automate and solve (for example, Python and/or JavaScript), with an engineering approach to testing, review, and maintainability.
Experience building actionable monitoring and alerting that balances sensitivity and noise (for example, Datadog and CloudWatch).
Confidence communicating, staying calm during incidents, and influencing teams through data, clear writing, and pragmatic recommendations.

Good to have Skills:

Experience operating Kubernetes clusters beyond day-to-day use (upgrades, networking, security, policy, and multi-tenant considerations).

Experience with performance testing, load testing, and capacity modelling for cloud-native services.

Software engineering experience in Python and/or JavaScript beyond scripting (testing, packaging, code reviews).

Experience improving observability practices at scale (for example: metric design, alert tuning, incident analytics).

Proud to share LSEG in the India is Great Place to Work certified (Jun 25 Jun 26).

Learn more about life and purpose of our company directly from India colleagues video: Bengaluru, India | Where We Work | LSEG

Career Stage:

Senior Associate

London Stock Exchange Group (LSEG) Information:

Join us and be part of a team that values innovation, quality, and continuous improvement. If you're ready to take your career to the next level and make a significant impact, we'd love to hear from you.

LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.

Our purpose is the foundation on which our culture is built. Our values of Integrity, Partnership, Excellence and Change underpin our purpose and set the standard for everything we do, every day. They go to the heart of who we are and guide our decision making and everyday actions.

Working with us means that you will be part of a dynamic organisation of 25,000 people across 65 countries. However, we will value your individuality and enable you to bring your true self to work so you can help enrich our diverse workforce.

We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyones race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.

You will be part of a collaborative and creative culture where we encourage new ideas. We are committed to sustainability across our global business and we are proud to partner with our customers to help them meet their sustainability objectives. Our charity, the LSEG Foundation provides charitable grants to community groups that help people access economic opportunities and build a secure future with financial independence. Colleagues can get involved through fundraising and volunteering.

LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.

Please take a moment to read this privacy notice carefully, as it describes what personal information London Stock Exchange Group (LSEG) (we) may hold about you, what its used for, and how its obtained, your rights and how to contact us as a data subject.

If you are submitting as a Recruitment Agency Partner, it is essential and your responsibility to ensure that candidates applying to LSEG are aware of this privacy notice.

Location: IND-BLR-Divyasree Technopolis

Time Type: Full time

Full Job Description

Senior Specialist Site Reliability Engineer, Risk Intelligence

Key responsibilities

Own reliability outcomes for critical user journeys and cloud components by defining SLIs/SLOs, setting error budgets, and using data to prioritise resilience work.
Design observability standards (metrics, logs, traces, dashboards, alerting) using tools such as Datadog and AWS CloudWatch so teams can detect issues early and diagnose them quickly.
Lead major incident response as incident commander when needed, drive blameless post-incident reviews, and ensure corrective actions are implemented and measured.
Improve platform resilience and scalability through reliability architecture reviews, capacity and performance engineering, and proactive risk assessments.
Reduce toil by automating repetitive operational work (for example with Terraform, CI/CD, and scripting), and track improvements through clear operational metrics.
Partner with Product, Engineering, Customer Support, Security, and leadership to balance reliability, feature delivery, compliance obligations, and customer impact.
Create and maintain operational documentation, runbooks, and change procedures, and raise standards through reviews, templates, and shared practices.
Coach and mentor engineers, helping teams adopt SRE ways of working and strengthening operational ownership.

What youll bring

Significant experience in SRE/DevOps/Platform Engineering (or equivalent practical experience) supporting production services and making reliability improvements that stick.
Demonstrated ownership of reliability practices such as SLOs, error budgets, incident management, post-incident actions, and operational readiness.
Strong AWS knowledge and hands-on experience operating cloud services (for example: EKS, Lambda, SQS, API Gateway, DocumentDB, RDS/MySQL, and CloudWatch/CloudWatch Insights).
Experience with infrastructure-as-code and automation (for example, Terraform) and with CI/CD pipelines (for example, GitLab CI/CD).
Experience with Kubernetes, including production operations on managed platforms such as Amazon EKS.
Ability to script to automate and solve (for example, Python and/or JavaScript), with an engineering approach to testing, review, and maintainability.
Experience building actionable monitoring and alerting that balances sensitivity and noise (for example, Datadog and CloudWatch).
Confidence communicating, staying calm during incidents, and influencing teams through data, clear writing, and pragmatic recommendations.

Good to have Skills:

Experience operating Kubernetes clusters beyond day-to-day use (upgrades, networking, security, policy, and multi-tenant considerations).

Experience with performance testing, load testing, and capacity modelling for cloud-native services.

Software engineering experience in Python and/or JavaScript beyond scripting (testing, packaging, code reviews).

Experience improving observability practices at scale (for example: metric design, alert tuning, incident analytics).

Proud to share LSEG in the India is Great Place to Work certified (Jun 25 Jun 26).

Learn more about life and purpose of our company directly from India colleagues video: Bengaluru, India | Where We Work | LSEG

Career Stage:

Senior Associate

London Stock Exchange Group (LSEG) Information:

LSEG is a leading global financial markets infrastructure and data provider. Our purpose is driving financial stability, empowering economies and enabling customers to create sustainable growth.

LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.

If you are submitting as a Recruitment Agency Partner, it is essential and your responsibility to ensure that candidates applying to LSEG are aware of this privacy notice.

Location: IND-BLR-Divyasree Technopolis

Time Type: Full time

Senior Specialist - Site Reliability Engineering

Full Job Description

SIMILAR OPPORTUNITIES

Senior Engineer - Site Reliability Engineering

Senior Lead Site Reliability Engineer

Site Reliability Engineer, Senior

Site Reliability Engineer -Senior Associate

Cloud Site Reliability Engineer Specialist

Senior Specialist - Site Reliability Engineering

Full Job Description

SIMILAR OPPORTUNITIES

Senior Engineer - Site Reliability Engineering

Senior Lead Site Reliability Engineer

Site Reliability Engineer, Senior

Site Reliability Engineer -Senior Associate

Cloud Site Reliability Engineer Specialist