Hedge Funds

Posted 2 months ago

No clicks

Join Qube Research & Technologies' Workload Scheduling team to develop, deploy, and maintain high-performance computing (HPC) platforms across cloud and on-premises environments. You will integrate and support schedulers such as Yellow Dog and Ray, optimise performance, resilience, and observability, and contribute to infrastructure automation. The role involves collaborating with internal teams, tuning compute/network/storage components, and coaching colleagues while working in a fast-paced environment.

Compensation: Not specified
City: Not specified
Country: Not specified

Full Job Description

Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology and data driven group implementing a scientific approach to investing. Combining data, research, technology, and trading expertise has shaped QRT’s collaborative mindset which enables us to solve the most complex challenges. QRT’s culture of innovation continuously drives our ambition to deliver high quality returns for our investors.

Join QRT as a technologist within our Workload Scheduling (WLS) team. This key role supports both business and technology groups in integrating High Performance Computing (HPC) solutions, enabling scalable and efficient compute capabilities. You will be instrumental in developing, deploying, and maintaining HPC platforms that leverage Yellow Dog and Ray schedulers across cloud and on-premises infrastructures.

Your Future Role within QRT:

Develop and support scalable workload scheduling solutions for HPC environments
Collaborate with internal teams to adopt and optimize HPC platforms
Improve the performance, resilience, and observability of compute infrastructure
Contribute to infrastructure automation and continuous improvement initiatives
Share expertise and support team development through coaching and collaboration

Your Present Skillset:

Experience of engineering and supporting at least one HPC scheduler, such as YellowDog, Ray, Slurm or IBM Symphony
Good understanding of both loosely coupled and tightly coupled HPC workloads
Experience of developing and supporting large-scale systems (5000+ nodes) and high levels of concurrency (100k+ tasks)
Experience of monitoring and visualisation of large-scale systems
Performance tuning of compute, network and storage components
Good understanding of the challenges of user authorisation in large scale distributed environments using AWS IAM and identity providers such as Okta
Good understanding of core AWS services
VPC security and networking
EC2 configuration and scaling
Storage services S3, EFS, EBS and FSx
CloudWatch / CloudTrail / OpenSearch / Athena
Experience of developing Python applications and tools
Experience with infrastructure-as-code using configuration languages and tools, particularly Terraform and Ansible
Solid understanding of Linux administration skills
Good understanding of various storage solutions and their applicability for different use cases
Able to work in a fast-paced environment with multiple conflicting demands and changing priorities
Effective communicator, able to describe complex issues at the appropriate level for a given audience
Happy to coach colleagues and eager to learn from them

QRT is an equal opportunity employer. We welcome diversity as essential to our success. QRT empowers employees to work openly and respectfully to achieve collective success. In addition to professional achievement, we are offering initiatives and programs to enable employees achieve a healthy work-life balance.

Full Job Description

Your Future Role within QRT:

Develop and support scalable workload scheduling solutions for HPC environments
Collaborate with internal teams to adopt and optimize HPC platforms
Improve the performance, resilience, and observability of compute infrastructure
Contribute to infrastructure automation and continuous improvement initiatives
Share expertise and support team development through coaching and collaboration

Your Present Skillset:

Experience of engineering and supporting at least one HPC scheduler, such as YellowDog, Ray, Slurm or IBM Symphony
Good understanding of both loosely coupled and tightly coupled HPC workloads
Experience of developing and supporting large-scale systems (5000+ nodes) and high levels of concurrency (100k+ tasks)
Experience of monitoring and visualisation of large-scale systems
Performance tuning of compute, network and storage components
Good understanding of the challenges of user authorisation in large scale distributed environments using AWS IAM and identity providers such as Okta
Good understanding of core AWS services
VPC security and networking
EC2 configuration and scaling
Storage services S3, EFS, EBS and FSx
CloudWatch / CloudTrail / OpenSearch / Athena
Experience of developing Python applications and tools
Experience with infrastructure-as-code using configuration languages and tools, particularly Terraform and Ansible
Solid understanding of Linux administration skills
Good understanding of various storage solutions and their applicability for different use cases
Able to work in a fast-paced environment with multiple conflicting demands and changing priorities
Effective communicator, able to describe complex issues at the appropriate level for a given audience
Happy to coach colleagues and eager to learn from them

HPC Platform Management Engineer

Full Job Description

SIMILAR OPPORTUNITIES

HPC Research Platform - Software Engineer

HPC Engineer

HPC Infrastructure Support Engineer

HPC Production Engineer

HPC & Infrastructure Engineer

HPC Platform Management Engineer

Full Job Description

SIMILAR OPPORTUNITIES

HPC Research Platform - Software Engineer

HPC Engineer

HPC Infrastructure Support Engineer

HPC Production Engineer

HPC & Infrastructure Engineer