Bulge Bracket Investment Banks

Posted 9 days ago

No clicks

**AI Systems Engineer - Asset Management (Associate/VP) | Shanghai** Drive AI innovation in asset management by building and optimizing enterprise LLM serving platforms and distributed AI infrastructure. Key responsibilities include: - Develop and fine-tune LLM serving platforms using techniques like PagedAttention, continuous batching, and quantization for high throughput and low-latency. - Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization and manage distributed training clusters. - Streamline CI/CD pipeline for AI models, implementing automated benchmarking and zero-downtime deployment. Required qualifications include a degree in Computer Science/Engineering and 3+ years of experience, with at least 1-2 years in LLM serving, GPU optimization, or ML Systems. Proficient in Python and Java, with deep understanding of Linux internals, distributed systems, and hands-on experience with Kubernetes. Familiarity with LLM inference engines, GPU architecture, and distributed training frameworks is essential. A "hacker" mindset and ability to collaborate effectively are crucial. Preferred qualifications include open-source AI Infra project contributions, custom CUDA kernel experience, and financial industry experience. Professional English proficiency is required.

Compensation: Not specified
City: Shanghai
Country: China

Full Job Description

Location: Shanghai, China

Key Responsibilities

Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).

Qualifications

1. Education & Experience:

Bachelors, Masters, or Ph.D. in Computer Science, Computer Engineering, or a related field.
3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.

2. Core Engineering & Systems Skills:

Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
Deep understanding of Linux internals, networking, and distributed systems architecture.
Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.

3. AI Infrastructure & Optimization Skills:

Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or
Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or
Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).

4. Mindset & Soft Skills:

A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).

Preferred

Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
Experience writing custom CUDA kernels or using Triton for operator fusion.
Financial industry (Asset Management/Quant) experience is a plus.
Language: Professional working proficiency in English to collaborate with global teams.

As an AI Systems Engineer, you will be the backbone of our AI initiatives. While our AI Researchers focus on model intelligence, your mission is to make our AI systems fast, scalable, cost-efficient, and highly reliable. You will design and build the underlying AI infrastructure, including GPU resource pooling, high-performance LLM inference platforms, and distributed training frameworks. You will solve hardcore engineering challenges in model deployment, memory optimization, and distributed systems to empower our asset management business with enterprise-grade AI capabilities.

Full Job Description

Location: Shanghai, China

Key Responsibilities

Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).

Qualifications

1. Education & Experience:

Bachelors, Masters, or Ph.D. in Computer Science, Computer Engineering, or a related field.
3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.

2. Core Engineering & Systems Skills:

Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
Deep understanding of Linux internals, networking, and distributed systems architecture.
Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.

3. AI Infrastructure & Optimization Skills:

Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or
Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or
Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).

4. Mindset & Soft Skills:

A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).

Preferred

Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
Experience writing custom CUDA kernels or using Triton for operator fusion.
Financial industry (Asset Management/Quant) experience is a plus.
Language: Professional working proficiency in English to collaborate with global teams.

Asset Management - AI Systems Engineer – Associate/VP

Full Job Description

SIMILAR OPPORTUNITIES

Asset Management - AI Algorithm Engineer - Associate/VP

Artificial Intelligence Engineer, Portfolio Management Group, Associate/Vice President

Agentic AI Senior Engineer - Assistant Vice President

US Tech - AI Engineering Senior Associate

Associate Director, Software Engineering (AI Platform Infrastructure)

Asset Management - AI Systems Engineer – Associate/VP

Full Job Description

SIMILAR OPPORTUNITIES

Asset Management - AI Algorithm Engineer - Associate/VP

Artificial Intelligence Engineer, Portfolio Management Group, Associate/Vice President

Agentic AI Senior Engineer - Assistant Vice President

US Tech - AI Engineering Senior Associate

Associate Director, Software Engineering (AI Platform Infrastructure)