Proprietary Trading

Posted 8 days ago

No clicks

**Compute Operations Engineer** Support day-to-day on-prem compute infrastructure at QRT, collaborating with cross-functional teams. **Responsibilities** include server hardware management, Linux systems troubleshooting, infrastructure monitoring, hardware lifecycle activities, and user-facing issue resolution across Slurm and Kubernetes platforms. **Skills required**: 2-5 years in compute infrastructure, RHEL proficiency, server hardware knowledge, monitoring tools familiarity, HPC/SLURM/Kubernetes experience, automation scripting, and strong communication.

Compensation: Not specified
City: Not specified
Country: Not specified

Full Job Description

Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology- and data-driven group implementing a scientific approach to investing. Combining data, research, technology, and trading expertise has shaped our collaborative mindset, which enables us to solve the most complex challenges. QRTs culture of innovation continuously drives our ambition to deliver high-quality returns for our investors.

The Compute Operations team support the day-to-day operation of on-prem compute infrastructure, covering HPC server hardware, Linux-based platforms, and user-facing support. You will work closely with the Compute Ops Team Lead, Linux engineers, and other platform groups to maintain reliable, performant compute services across Slurm, Kubernetes, and control-plane environments.

Your Future Role within QRT
You will:

Provide hands-on support for HPC server hardware, including diagnostics, issue investigation, and coordination with vendors for repairs
Monitor system health and respond to alerts using infrastructure monitoring tools
Support hardware lifecycle activities, including provisioning, maintenance, and decommissioning
Troubleshoot Linux-based systems across OS, networking, and storage layers
Triage and resolve user-facing issues across compute platforms such as Slurm and Kubernetes
Coordinate with internal teams and vendors on maintenance and incident resolution
Execute scheduled maintenance and change activities
Maintain accurate infrastructure records and documentation
Contribute to runbooks and continuous improvement of operational processes
Participate in on-call rotations and incident response

Your Present Skillset

25 years of experience in compute infrastructure, systems engineering, or a related role
Strong Linux systems administration experience (i.e. RHEL, Rocky Linux, or similar)
Strong understanding of server hardware (i.e. compute, storage, networking components)
Familiarity with infrastructure monitoring tools (e.g. OneView, Dell OME, or similar)
Exposure to HPC or platform environments such as Slurm or Kubernetes
Experience or familiarity with operational tooling (i.e. NetBox, DNS, HashiCorp Vault, Ansible, scripting languages or similar)
Knowledge of automation or scripting (e.g. Bash, Python, Ansible)
Strong troubleshooting and problem-solving skills
Ability to communicate effectively and work in a collaborative environment
Understanding of datacentre operations and safety practices is beneficial

QRT is an equal opportunity employer. We welcome diversity as essential to our success. QRT empowers employees to work openly and respectfully to achieve collective success. In addition to professional achievement, we are offering initiatives and programs to enable employees achieve a healthy work-life balance.

Full Job Description

Your Future Role within QRT
You will:

Provide hands-on support for HPC server hardware, including diagnostics, issue investigation, and coordination with vendors for repairs
Monitor system health and respond to alerts using infrastructure monitoring tools
Support hardware lifecycle activities, including provisioning, maintenance, and decommissioning
Troubleshoot Linux-based systems across OS, networking, and storage layers
Triage and resolve user-facing issues across compute platforms such as Slurm and Kubernetes
Coordinate with internal teams and vendors on maintenance and incident resolution
Execute scheduled maintenance and change activities
Maintain accurate infrastructure records and documentation
Contribute to runbooks and continuous improvement of operational processes
Participate in on-call rotations and incident response

Your Present Skillset

25 years of experience in compute infrastructure, systems engineering, or a related role
Strong Linux systems administration experience (i.e. RHEL, Rocky Linux, or similar)
Strong understanding of server hardware (i.e. compute, storage, networking components)
Familiarity with infrastructure monitoring tools (e.g. OneView, Dell OME, or similar)
Exposure to HPC or platform environments such as Slurm or Kubernetes
Experience or familiarity with operational tooling (i.e. NetBox, DNS, HashiCorp Vault, Ansible, scripting languages or similar)
Knowledge of automation or scripting (e.g. Bash, Python, Ansible)
Strong troubleshooting and problem-solving skills
Ability to communicate effectively and work in a collaborative environment
Understanding of datacentre operations and safety practices is beneficial

Compute Operations Engineer

Full Job Description

SIMILAR OPPORTUNITIES

Data Operations Engineer - Python

HPC Operations Engineer

Systems Operations Engineer

IT Operation Engineer

Core Operations Analyst

Compute Operations Engineer

Full Job Description

SIMILAR OPPORTUNITIES

Data Operations Engineer - Python

HPC Operations Engineer

Systems Operations Engineer

IT Operation Engineer

Core Operations Analyst