LOG IN
SIGN UP
Canary Wharfian - Online Investment Banking & Finance Community.
Sign In
or continue with e-mail and password
Forgot password?
Don't have an account?
Create an account
or continue with e-mail and password
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Asset Management - AI Systems Engineer – Associate/VP

ExperiencedNo visa sponsorship
J.P. Morgan logo

at J.P. Morgan

Bulge Bracket Investment Banks

Posted 9 days ago

No clicks

**AI Systems Engineer - Asset Management (Associate/VP) | Shanghai** Drive AI innovation in asset management by building and optimizing enterprise LLM serving platforms and distributed AI infrastructure. Key responsibilities include: - Develop and fine-tune LLM serving platforms using techniques like PagedAttention, continuous batching, and quantization for high throughput and low-latency. - Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization and manage distributed training clusters. - Streamline CI/CD pipeline for AI models, implementing automated benchmarking and zero-downtime deployment. Required qualifications include a degree in Computer Science/Engineering and 3+ years of experience, with at least 1-2 years in LLM serving, GPU optimization, or ML Systems. Proficient in Python and Java, with deep understanding of Linux internals, distributed systems, and hands-on experience with Kubernetes. Familiarity with LLM inference engines, GPU architecture, and distributed training frameworks is essential. A "hacker" mindset and ability to collaborate effectively are crucial. Preferred qualifications include open-source AI Infra project contributions, custom CUDA kernel experience, and financial industry experience. Professional English proficiency is required.

Compensation
Not specified

Currency: Not specified

City
Shanghai
Country
China

Full Job Description

Location: Shanghai, China

Key Responsibilities

  • Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
  • GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
  • Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).

Qualifications

1. Education & Experience:

  • Bachelors, Masters, or Ph.D. in Computer Science, Computer Engineering, or a related field.
  • 3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.

2. Core Engineering & Systems Skills:

  • Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
  • Deep understanding of Linux internals, networking, and distributed systems architecture.
  • Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.

3. AI Infrastructure & Optimization Skills:

  • Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or 
  • Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or 
  • Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).

4. Mindset & Soft Skills:

  • A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
  • Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).

Preferred

  • Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
  • Experience writing custom CUDA kernels or using Triton for operator fusion.
  • Financial industry (Asset Management/Quant) experience is a plus.
  • Language: Professional working proficiency in English to collaborate with global teams.
As an AI Systems Engineer, you will be the backbone of our AI initiatives. While our AI Researchers focus on model intelligence, your mission is to make our AI systems fast, scalable, cost-efficient, and highly reliable. You will design and build the underlying AI infrastructure, including GPU resource pooling, high-performance LLM inference platforms, and distributed training frameworks. You will solve hardcore engineering challenges in model deployment, memory optimization, and distributed systems to empower our asset management business with enterprise-grade AI capabilities.

Asset Management - AI Systems Engineer – Associate/VP

Compensation

Not specified

City: Shanghai

Country: China

J.P. Morgan logo
Bulge Bracket Investment Banks

9 days ago

No clicks

at J.P. Morgan

ExperiencedNo visa sponsorship

**AI Systems Engineer - Asset Management (Associate/VP) | Shanghai** Drive AI innovation in asset management by building and optimizing enterprise LLM serving platforms and distributed AI infrastructure. Key responsibilities include: - Develop and fine-tune LLM serving platforms using techniques like PagedAttention, continuous batching, and quantization for high throughput and low-latency. - Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization and manage distributed training clusters. - Streamline CI/CD pipeline for AI models, implementing automated benchmarking and zero-downtime deployment. Required qualifications include a degree in Computer Science/Engineering and 3+ years of experience, with at least 1-2 years in LLM serving, GPU optimization, or ML Systems. Proficient in Python and Java, with deep understanding of Linux internals, distributed systems, and hands-on experience with Kubernetes. Familiarity with LLM inference engines, GPU architecture, and distributed training frameworks is essential. A "hacker" mindset and ability to collaborate effectively are crucial. Preferred qualifications include open-source AI Infra project contributions, custom CUDA kernel experience, and financial industry experience. Professional English proficiency is required.

Full Job Description

Location: Shanghai, China

Key Responsibilities

  • Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
  • GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
  • Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).

Qualifications

1. Education & Experience:

  • Bachelors, Masters, or Ph.D. in Computer Science, Computer Engineering, or a related field.
  • 3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.

2. Core Engineering & Systems Skills:

  • Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
  • Deep understanding of Linux internals, networking, and distributed systems architecture.
  • Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.

3. AI Infrastructure & Optimization Skills:

  • Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or 
  • Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or 
  • Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).

4. Mindset & Soft Skills:

  • A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
  • Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).

Preferred

  • Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
  • Experience writing custom CUDA kernels or using Triton for operator fusion.
  • Financial industry (Asset Management/Quant) experience is a plus.
  • Language: Professional working proficiency in English to collaborate with global teams.
As an AI Systems Engineer, you will be the backbone of our AI initiatives. While our AI Researchers focus on model intelligence, your mission is to make our AI systems fast, scalable, cost-efficient, and highly reliable. You will design and build the underlying AI infrastructure, including GPU resource pooling, high-performance LLM inference platforms, and distributed training frameworks. You will solve hardcore engineering challenges in model deployment, memory optimization, and distributed systems to empower our asset management business with enterprise-grade AI capabilities.