
at J.P. Morgan
Bulge Bracket Investment BanksPosted 9 days ago
No clicks
**AI Systems Engineer - Asset Management (Associate/VP) | Shanghai** Drive AI innovation in asset management by building and optimizing enterprise LLM serving platforms and distributed AI infrastructure. Key responsibilities include: - Develop and fine-tune LLM serving platforms using techniques like PagedAttention, continuous batching, and quantization for high throughput and low-latency. - Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization and manage distributed training clusters. - Streamline CI/CD pipeline for AI models, implementing automated benchmarking and zero-downtime deployment. Required qualifications include a degree in Computer Science/Engineering and 3+ years of experience, with at least 1-2 years in LLM serving, GPU optimization, or ML Systems. Proficient in Python and Java, with deep understanding of Linux internals, distributed systems, and hands-on experience with Kubernetes. Familiarity with LLM inference engines, GPU architecture, and distributed training frameworks is essential. A "hacker" mindset and ability to collaborate effectively are crucial. Preferred qualifications include open-source AI Infra project contributions, custom CUDA kernel experience, and financial industry experience. Professional English proficiency is required.
- Compensation
- Not specified
- City
- Shanghai
- Country
- China
Currency: Not specified
Full Job Description
Location: Shanghai, China
Key Responsibilities
- Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
- GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
- Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).
Qualifications
1. Education & Experience:
- Bachelors, Masters, or Ph.D. in Computer Science, Computer Engineering, or a related field.
- 3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.
2. Core Engineering & Systems Skills:
- Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
- Deep understanding of Linux internals, networking, and distributed systems architecture.
- Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.
3. AI Infrastructure & Optimization Skills:
- Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or
- Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or
- Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).
4. Mindset & Soft Skills:
- A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
- Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).
Preferred
- Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
- Experience writing custom CUDA kernels or using Triton for operator fusion.
- Financial industry (Asset Management/Quant) experience is a plus.
- Language: Professional working proficiency in English to collaborate with global teams.




