
Posted 15 days ago
No clicks
Join Qube Research & Technologies to operate, scale, and evolve on-prem database platforms with a strong focus on reliability, performance, backups, replication, capacity planning, automation and incident response. You will build and maintain infrastructure foundations (compute, storage, networking, security), automate database operations, and support Kubernetes adoption for stateful data workloads while partnering with engineering, data and research teams. The role requires a reliability-first mindset, strong Linux/on-prem experience, Infrastructure-as-Code and automation skills, and the ability to debug complex end-to-end systems.
- Compensation
- Not specified
- City
- Not specified
- Country
- Not specified
Currency: Not specified
Full Job Description
- Own the reliability of database platforms, including availability, performance, backups, restores, replication, upgrades, capacity planning, and incident response.
- Automate database operations such as provisioning, patching, migrations, schema changes, maintenance windows, and safe, auditable rollouts.
- Build and maintain on-prem infrastructure foundations: compute, storage, networking, security hardening, observability, and disaster recovery.
- Support Kubernetes adoption for data workloads by defining patterns for stateful services, operators, storage classes, resource management, and safe lifecycle practices.
- Define and enforce operational guardrails, including access controls, secrets management, change management workflows, and runbooks.
- Partner with internal users (data platform, engineering, researchers) to improve self-service capabilities and reduce operational friction without compromising safety.
- Strong experience operating production databases (PostgreSQL and/or ClickHouse are a plus; other robust database technologies are also welcome).
- A reliability-first mindset, with attention to latency, correctness, recovery objectives, and the realities of running 24/7 systems.
- Solid experience with Linux and on-prem environments, with a good understanding of how compute, storage, and networking impact database behavior.
- Experience with automation and Infrastructure-as-Code tools (e.g. Terraform, Ansible, Helm, GitOps-style workflows).
- Familiarity with Kubernetes, particularly for stateful workloads, or a strong motivation to deepen your expertise in this area.
- Ability to debug complex systems end-to-end, from application queries down to kernel, network, and storage lay.
- Experience with high-throughput and/or low-latency data systems.
- Strong observability skills (metrics, logs, traces, alerting, SLOs).
- Experience building internal platforms or self-service tooling.
- Programming skills (Python, Go, Rust, or similar) to build automation and operational tools.





