Bulge Bracket Investment Banks

Posted 7 days ago

No clicks

**Role: Software Engineer II - MLOps Engineer** Design, build, and manage end-to-end MLOps pipelines. Architect and manage scalable AWS environments using Terraform. Support GenAI projects, applying prompt engineering best practices. Design agentic workflows using tools like LangChain. Implement model monitoring strategies and ensure adherence to data security standards. Collaborate with cross-functional teams and communicate technical concepts effectively. Requires a degree in software engineering or a related field, along with 2+ years of experience. Proficiency in Python, understanding of MLOps, DevOps principles, and GenAI is essential. Experience with AWS, Terraform, CI/CD pipelines, and orchestration frameworks is preferred. Join our agile team to deliver secure, scalable technology products.

Compensation: Not specified
City: Not specified
Country: India

Full Job Description

Location: Hyderabad, Telangana, India

As a Software Engineer II at JPMorgan Chase within the Employee Platform team, you will design, build, and manage production-grade machine learning and Generative AI solutions that solve meaningful business problems.

Job Responsibilities

Design, develop, and maintain end-to-end MLOps pipelines that automate the deployment, scaling, and lifecycle management of machine learning and GenAI models in production, ensuring reliable and repeatable workflows from data ingestion through model serving.
Architect and manage scalable, secure AWS environments for hosting ML and GenAI workloads, using Terraform to provision and maintain infrastructure in a consistent, version-controlled manner.
Support and drive the deployment of GenAI-based projects, including applications built on large language models, retrieval-augmented generation (RAG) architectures, and agentic frameworks. Apply prompt engineering best practices to optimize model behavior, output quality, and reliability for business use cases.
Design and develop agentic workflows and AI agents using orchestration frameworks such as LangChain, LangGraph, CrewAI, AutoGen, or similar tooling.
Implement monitoring strategies to track model performance, detect drift, and ensure the ongoing health of deployed models. Establish alerting mechanisms for anomalies and performance degradation.
Partner with data scientists, software engineers, and business stakeholders to translate requirements into production-ready solutions. Communicate technical concepts clearly to non-technical audiences.
Ensure all solutions adhere to industry regulations and organizational standards, maintaining data security and privacy across the full development and deployment lifecycle.

Required Qualifications, Capabilities, and Skills

Formal training or certification in software engineering, machine learning, or related concepts and 2+ years of relevant applied experience.
Solid grasp of MLOps and DevOps principles with hands-on experience building CI/CD pipelines and automating model deployment workflows.
Working knowledge of AWS cloud services for hosting, orchestrating, and monitoring ML and AI workloads.
Proficiency in Terraform for provisioning and managing cloud infrastructure in a repeatable and auditable manner
Demonstrated experience with GenAI-based projects, including working with large language models, prompt engineering techniques, RAG pipelines, and related frameworks. Ability to design, test, and iterate on prompts to drive reliable model outputs.
Experience designing and developing agentic workflows and AI agents, with familiarity in agent orchestration frameworks such as LangChain, LangGraph, CrewAI, AutoGen, or similar tooling.
Strong proficiency in Python or another programming languages.
Experience with ML serving frameworks such as FastAPI, TensorFlow Serving, or PyTorch for deploying models into production.
Experience with experiment tracking with Weights & Biases and model registry tools such as MLflow.
Experience implementing A/B testing, canary releases, or shadow deployment strategies for safely rolling out new models and also experience with Docker, Containerization and Kubernetes (EKS) for packaging, deploying, and scaling ML workloads in production environments.
Experience with Databricks Asset Bundles (DABs) for CI/CD-driven project deployments, Feature Store for managing and serving ML features, Unity Catalog for data governance and access control, Delta Live Tables for declarative pipeline development, and Model Serving endpoints for real-time inference.

Preferred Qualifications, Capabilities, and Skills

Experience with fine-tuning pre-trained models (including LLMs) using techniques such as LoRA, QLoRA, PEFT, or full fine-tuning for domain-specific use cases.
Familiarity with feature store platforms (e.g., Feast, SageMaker Feature Store) for managing, versioning, and serving ML features across training and inference.

Serve as an emerging member of an agile team to design and deliver market-leading technology products in a secure and scalable way

Full Job Description

Location: Hyderabad, Telangana, India

Job Responsibilities

Design, develop, and maintain end-to-end MLOps pipelines that automate the deployment, scaling, and lifecycle management of machine learning and GenAI models in production, ensuring reliable and repeatable workflows from data ingestion through model serving.
Architect and manage scalable, secure AWS environments for hosting ML and GenAI workloads, using Terraform to provision and maintain infrastructure in a consistent, version-controlled manner.
Support and drive the deployment of GenAI-based projects, including applications built on large language models, retrieval-augmented generation (RAG) architectures, and agentic frameworks. Apply prompt engineering best practices to optimize model behavior, output quality, and reliability for business use cases.
Design and develop agentic workflows and AI agents using orchestration frameworks such as LangChain, LangGraph, CrewAI, AutoGen, or similar tooling.
Implement monitoring strategies to track model performance, detect drift, and ensure the ongoing health of deployed models. Establish alerting mechanisms for anomalies and performance degradation.
Partner with data scientists, software engineers, and business stakeholders to translate requirements into production-ready solutions. Communicate technical concepts clearly to non-technical audiences.
Ensure all solutions adhere to industry regulations and organizational standards, maintaining data security and privacy across the full development and deployment lifecycle.

Required Qualifications, Capabilities, and Skills

Formal training or certification in software engineering, machine learning, or related concepts and 2+ years of relevant applied experience.
Solid grasp of MLOps and DevOps principles with hands-on experience building CI/CD pipelines and automating model deployment workflows.
Working knowledge of AWS cloud services for hosting, orchestrating, and monitoring ML and AI workloads.
Proficiency in Terraform for provisioning and managing cloud infrastructure in a repeatable and auditable manner
Demonstrated experience with GenAI-based projects, including working with large language models, prompt engineering techniques, RAG pipelines, and related frameworks. Ability to design, test, and iterate on prompts to drive reliable model outputs.
Experience designing and developing agentic workflows and AI agents, with familiarity in agent orchestration frameworks such as LangChain, LangGraph, CrewAI, AutoGen, or similar tooling.
Strong proficiency in Python or another programming languages.
Experience with ML serving frameworks such as FastAPI, TensorFlow Serving, or PyTorch for deploying models into production.
Experience with experiment tracking with Weights & Biases and model registry tools such as MLflow.
Experience implementing A/B testing, canary releases, or shadow deployment strategies for safely rolling out new models and also experience with Docker, Containerization and Kubernetes (EKS) for packaging, deploying, and scaling ML workloads in production environments.
Experience with Databricks Asset Bundles (DABs) for CI/CD-driven project deployments, Feature Store for managing and serving ML features, Unity Catalog for data governance and access control, Delta Live Tables for declarative pipeline development, and Model Serving endpoints for real-time inference.

Preferred Qualifications, Capabilities, and Skills

Experience with fine-tuning pre-trained models (including LLMs) using techniques such as LoRA, QLoRA, PEFT, or full fine-tuning for domain-specific use cases.
Familiarity with feature store platforms (e.g., Feast, SageMaker Feature Store) for managing, versioning, and serving ML features across training and inference.

Serve as an emerging member of an agile team to design and deliver market-leading technology products in a secure and scalable way

Software Engineer II - MLOps Engineer

Full Job Description

SIMILAR OPPORTUNITIES

Software Engineer II - AI/ML Engineer

MLOps Engineer - Machine Learning Platform

Machine Learning Engineer (MLOps)

MLOps Engineer

Machine Learning Engineer

Software Engineer II - MLOps Engineer

Full Job Description

SIMILAR OPPORTUNITIES

Software Engineer II - AI/ML Engineer

MLOps Engineer - Machine Learning Platform

Machine Learning Engineer (MLOps)

MLOps Engineer

Machine Learning Engineer