
at J.P. Morgan
Bulge Bracket Investment BanksPosted 3 days ago
No clicks
**Senior Principal Software Engineer - AI Foundation Services** In Plano, TX, helm AI Foundation Services, building secure, high-performance AI/ML infrastructure for JPMorganChase. Collaborate cross-functionally to synthesize business needs into robust designs. Oversee solution co-development, launch, and early operations, optimizing for scale, reliability, and security. Enhance firm-wide adoption via shared architectures, playbooks, and GPU baselines. Set strategy for AI-powered engineering, driving improvements in delivery speed and code quality. Leverage cloud-native architecture and secure-by-design practices. Requires 10+ years' experience, proficiency in AI/ML platforms and performance engineering.
- Compensation
- Not specified
- City
- Not specified
- Country
- United States
Currency: Not specified
Full Job Description
Location: Plano, TX, United States
If you are looking for a game-changing career, working for one of the world's leading financial institutions, youve come to the right place.
As a Senior Principal Software Engineer at JPMorganChase within AMDP/CDAO, you will serve as a hands-on thought leader and builder for AI Foundation Servicesthe scaled, secure, performance-optimized infrastructure that enables large-scale GenAI and traditional AI/ML across Lines of Business. You will partner directly with Lines of Business application teams to synthesize requirements into implementable designs, co-develop solutions through launch and early operations, and de-risk delivery across performance, scale, reliability, and security. You will also drive firmwide reuse through shared reference architectures, playbooks, test harnesses, and GPU training/serving baselines, raising the engineering bar and accelerating adoption across the portfolio.
Job responsibilities
- Leads as a hands-on technical thought leader to build, integrate, and optimize AI Foundation Services infrastructure for GenAI and traditional AI/ML platforms
- Co-develops with Lines of Business (LOB) application teams to deliver reusable AI/ML foundational services and managed service patterns
- Synthesizes Lines of Business (LOB) requirements into implementable designs and drives delivery from design through launch and early operational support
- De-risks delivery across performance, scale, reliability, and security by defining non-functional requirements, testing strategies, and operational readiness criteria
- Drives reuse and standardization through shared reference architectures, playbooks, test harnesses, and GPU training/serving baselines for model hosting platforms
- Sets strategy and operating standards for agentic AI-enabled engineering across a portfolio (using enterprise-authorized tools within the work environment) to drive measurable improvements in delivery speed, reliability, and code quality (e.g., AI-orchestrated SDLC/TLM automation, release readiness gating, incident triage/root-cause acceleration, and large-scale refactoring/test modernization), while defining guardrails for validation, security, resiliency, and reuse across teams and functions.
- Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation at scale
- Advises and leads on the strategy and development of multiple products, applications, and technologies across a portfolio by creating novel code solutions and drives the development of new production code capabilities across teams and functions
- Translates highly complex technical issues, trends, and approaches to leadership to drive the firms innovation and enable leaders to make strategic, well-informed decisions about technological advancements
- Drives adoption and implementation of technical methods in specialized fields in line with the latest product development methodologies
- Influences across business, product, and technology teams and successfully manages senior stakeholder relationships
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 10+ years applied experience
- Proven hands-on experience designing and operating AI/ML platform capabilities (model training, serving, feature/data access patterns, and multi-tenant controls)
- Demonstrated experience designing and scaling agentic AI-enabled development patterns (using enterprise-authorized tools within the work environment) across teams/functions, including establishing governance for human-in-the-loop validation, traceability/auditability, and secure handling of sensitive inputs/outputs.
- Strong understanding of responsible AI use and control expectations at scale, including security/resiliency implications, data sensitivity, and risk-based governance; ability to advise senior leaders on safe adoption, reuse, and measurable outcomes.
- Demonstrated expertise in performance engineering and production reliability (capacity planning, load testing, Service Level Objective (SLOs) /Service Level Indicator (SLIs), incident response, and root-cause remediation)
- Strong experience with cloud-native architecture (Kubernetes, containers, CI/CD, infrastructure-as-code using Terraform) and secure-by-design engineering practices
- Ability to lead end-to-end technical engagements with senior stakeholders, translating requirements into delivered services with clear milestones and acceptance criteria
- Practical experience delivering system design, application development, testing, and operational stability
- Demonstrated prior experience with influencing across functions and teams and delivering value at scale
- Experience applying expertise and new methods to determine solutions for complex technology problems across various technical disciplines
- Extensive practical cloud native experience
- Experience building GPU-backed model hosting platforms and optimizing inference/training performance (profiling, batching, caching, parallelism, and cost controls)
- Experience implementing reusable reference architectures and developer enablement assets (golden paths, templates, playbooks, and automated test harnesses)
- Experience with LLM and model serving stacks (e.g., routing, autoscaling, model gateways, online evaluation, and guardrails) in production environments
- Experience operating in regulated environments with strong controls (security reviews, threat modeling, audit readiness, and data governance)




