
at CME Group
OtherPosted 10 days ago
No clicks
**Site Reliability Engineer II (Tuesday - Saturday)** - NYC, Chicago CME Group's Markets portfolio seeks an SRE II to build, operate, and scale reliable, low-latency systems. Collaborate with senior engineers to enhance observability, implement AI-driven reliability solutions, and participate in incident response. Required: Linux, Python, problem-solving skills, and eagerness to learn in a fast-paced trading environment. Preferred: AI/ML for Operations, AIOps Platforms, Generative AI Tooling, and GCP experience. Tuesday to Saturday, 9am-5pm EST. Join CME Group to drive innovative SRE strategies in a global financial market leader.
- Compensation
- $93,900 – $172,000 USD
- City
- New York City, Chicago
- Country
- United States
Currency: $ (USD)
Full Job Description
This is Hybrid role, 2 days on site.
Role is located in NYC with alternative location Chicago, IL.
We are looking for local candidates only.
Working days: Tuesday-Saturday
Working hours: 9am-5pm EST
Description
Site Reliability Engineer II (Tuesday - Saturday)
CME Group is seeking an SRE II to help build, operate, and scale systems in our Markets portfolio. Markets SREs work on products and applications related to CMEs Globex trading platform. Our systems deliver an exceptional combination of low-latency performance and rock-solid reliability to seamlessly handle the worlds busiest trading days.
The successful candidate will work alongside senior engineers to learn how we observe, monitor, automate, and improve Production service reliability. As we evolve our operations, we are increasingly emphasizing the integration of Artificial Intelligence (AI) and Machine Learning (ML) to drive smarter, more predictive reliability and reduce operational toil.
Key Responsibilities:
- Work alongside product teams and senior engineers to assist with building out observability, monitoring, and alerting for key services.
- Implement AI-driven reliability solutions, including anomaly detection, predictive alerting, and root cause analysis in production environments.
- Collaborate with engineers and product teams to ensure requirements are understood, planned carefully, and implemented safely.
- Participate in on-call rotation and assist in incident response under guidance from senior engineers.
- Write scripts and tools to reduce toil and improve velocity, including building or integrating intelligent auto-remediation and capacity forecasting systems.
- Leverage LLMs and Generative AI to enhance incident management, automate runbooks, and streamline log analysis.
- Contribute to disaster recovery (DR) and systems resiliency testing & improvements.
- Support the migration of markets applications to Google Cloud Platform (GCP).
- Collaborate with cross-functional teams to improve system performance and operational efficiency.
What Were Looking For (Required):
- A keen interest in SRE, automation, and intelligent operations (AIOps).
- Experience with Linux-based systems.
- Programming and scripting skills (Python, Bash, etc.).
- Strong problem-solving and analytical abilities.
- Excellent communication and teamwork skills.
- Eagerness to learn and adapt in a fast-paced trading environment.
Preferred / Desirable Qualifications:
- AI/ML for Operations: Demonstrated hands-on experience applying AI/ML techniques to improve operational efficiency, reliability, or observability.
- AIOps Platforms: Experience using platforms such as Dynatrace, New Relic, Moogsoft, BigPanda, or integrating open-source tools (e.g., Prometheus with ML models).
- Generative AI Tooling: Experience with LLMs for operations, incident management, or log analysis (e.g., using LangChain, LlamaIndex, or tools like PagerDuty AIOps).
- Cloud Platforms: Experience with Cloud-based platformsGoogle Cloud Platform (GCP), GCE, and/or GKE is a strong bonus.
- Traditional Observability: Experience with metrics & monitoring tools like OpenTelemetry, Splunk, Prometheus, and Grafana.
- Systems Architecture: Experience with Kubernetes and knowledge of working with distributed systems.
- Core Concepts: Basic knowledge of networking (HTTP/TCP/UDP/IP) and message-oriented middleware.
- Industry & Process: Experience in financial markets and working in an Agile environment.
Why CME Group:
- Be part of a global leader in financial services technology.
- Work on cutting-edge technology and intelligent operations in a collaborative, innovative culture.
- Competitive compensation and benefits package.
- Opportunity to grow and advance your career in SRE with an organization that is transforming its approach to system reliability.
Join CME Group and play a crucial role in ensuring the stability and performance of our Markets applications while contributing to our GCP migration and AIOps evolution. Apply now to be a part of our dynamic SRE team!
#LI-DS2
CME Group: Where Futures are Made
CME Group is the worlds leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And were looking for more.
At CME Group, we embrace our employees' unique experiences and skills to ensure that everyones perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.
Important Notice: Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.
Location: New York - 300 Vesey Street
Time Type: Full time





