Bulge Bracket Investment Banks

Posted 7 days ago

No clicks

**Data Scientist Lead** in Tampa, FL, responsibilities include leading a team in advanced image classification, text categorization, and intelligent data extraction. Manage ML lifecycle from prototyping to deployment on AWS EKS using Python, PyTorch, TensorFlow, Hugging Face Transformers, and AWS SageMaker/Bedrock. Key skills required: deep learning, computer vision, NLP, OCR, AWS, SQL, Oracle databases.

Compensation: Not specified
City: Tampa
Country: United States

Full Job Description

Location: Tampa, FL, United States

As Data Scientist Lead within Commercial & Investment Bank with the Healthcare Provider team, you will lead a team in building advanced solutions for image classification, text categorization, and intelligent data extraction from scanned documents. You will have deep proficiency in Python, PyTorch, TensorFlow, Hugging Face Transformers, AWS SageMaker/Bedrock, and hands-on experience with CNN/transformer architectures, OCR technologies, and multimodal document understanding models. This role involves managing the full ML lifecycle, from prototyping to production deployment on AWS EKS.

Job responsibilities

Lead and mentor a team of data scientists in designing and executing advanced analytics and modeling projects focused on image classification, text categorization, and intelligent data extraction from scanned document images. Foster a culture of curiosity, analytical rigor, and continuous learning by developing team members in deep learning, computer vision, NLP, and document AI techniques.
Define and drive the analytical strategy for document understanding use cases, identifying the optimal combination of computer vision, NLP, and multimodal approaches.
Build and fine-tune multimodal document understanding and text categorization models. Leverage the interplay of textual content, spatial layout, and visual features to extract structured fields and key-value pairs from complex scanned documents, while enabling automated categorization, routing, metadata tagging, and entity extraction.
Design rigorous experimentation and data quality frameworks, including A/B testing, cross-validation strategies, and statistical significance testing to evaluate model performance and hyperparameter tuning. Establish best practices for annotation quality management, training data curation, active learning strategies, and ground truth validation to ensure high-quality labeled datasets.
Design, manage, and optimize the workflows involved in preparing data for machine learning model training, select statistical or Deep Learning models that are best positioned to achieve business results.
Develop and deploy models using Python and AWS SageMaker, managing the full lifecycle from exploratory data analysis and prototyping through production deployment, monitoring, and performance tracking. Collaborate with data engineers and ML engineers to ensure seamless integration of analytical models into production document processing pipelines and data workflows.

Required qualifications, capabilities, and skills

Bachelors degree or MS or PhD in quantitative discipline, e.g. Computer Science, Mathematics, Operations Research, Data Science.
7+ years of experience in data science or quantitative analytics, with at least 2+ years of experience in document AI, computer vision, or NLP domains.
Strong foundation in statistics, mathematics, and programming, including probability, mathematical modeling, and experimental design with the ability to rigorously evaluate model performance with advanced proficiency in Python for data analysis, modeling, and visualization, and deep experience in PyTorch, TensorFlow, Hugging Face Transformers, scikit-learn, OpenCV, pandas, NumPy, matplotlib, and seaborn.
Hands-on experience with CNN and transformer architectures for document AI for image classification, transfer learning, and feature extraction; multimodal document understanding combining textual, visual, and layout features; and NLP models for text categorization, sequence labeling, named entity recognition, and semantic analysis with familiarity with additional computer vision models including object detection, image segmentation, and Vision Transformers.
Working experience with OCR technologies and image preprocessing, for text extraction from scanned documents, with an understanding of OCR accuracy metrics, preprocessing optimization, and error analysis. Proficiency in image preprocessing techniques for scanned documents in TIF/PNG format, including deskewing, binarization, resolution enhancement, noise removal, and multi-page document handling.
Hands-on experience with AWS SageMaker and Amazon Bedrock, including building, training, tuning, and deploying ML models in cloud-based production environments (notebook instances, training jobs, inference endpoints), as well as exploring foundation models and generative AI capabilities to augment document understanding and classification workflows and experience with containerized deployments on AWS EKS for productionizing data science models and analytical services at scale.
Proficiency in SQL with strong working knowledge of Oracle databases for complex data extraction, transformation, and analysis of document metadata and extracted content with working knowledge of Java and Groovy for collaborating with engineering teams and understanding enterprise application codebases and strong understanding of annotation tools, active learning strategies, and training data management for supervised learning in document AI use cases.

Preferred qualifications, capabilities, and skills

Domain expertise in the healthcare industry

You will drive Data Science projects, leveraging expertise to deliver innovative solutions for the Healthcare Provider team.

Full Job Description

Location: Tampa, FL, United States

Job responsibilities

Lead and mentor a team of data scientists in designing and executing advanced analytics and modeling projects focused on image classification, text categorization, and intelligent data extraction from scanned document images. Foster a culture of curiosity, analytical rigor, and continuous learning by developing team members in deep learning, computer vision, NLP, and document AI techniques.
Define and drive the analytical strategy for document understanding use cases, identifying the optimal combination of computer vision, NLP, and multimodal approaches.
Build and fine-tune multimodal document understanding and text categorization models. Leverage the interplay of textual content, spatial layout, and visual features to extract structured fields and key-value pairs from complex scanned documents, while enabling automated categorization, routing, metadata tagging, and entity extraction.
Design rigorous experimentation and data quality frameworks, including A/B testing, cross-validation strategies, and statistical significance testing to evaluate model performance and hyperparameter tuning. Establish best practices for annotation quality management, training data curation, active learning strategies, and ground truth validation to ensure high-quality labeled datasets.
Design, manage, and optimize the workflows involved in preparing data for machine learning model training, select statistical or Deep Learning models that are best positioned to achieve business results.
Develop and deploy models using Python and AWS SageMaker, managing the full lifecycle from exploratory data analysis and prototyping through production deployment, monitoring, and performance tracking. Collaborate with data engineers and ML engineers to ensure seamless integration of analytical models into production document processing pipelines and data workflows.

Required qualifications, capabilities, and skills

Bachelors degree or MS or PhD in quantitative discipline, e.g. Computer Science, Mathematics, Operations Research, Data Science.
7+ years of experience in data science or quantitative analytics, with at least 2+ years of experience in document AI, computer vision, or NLP domains.
Strong foundation in statistics, mathematics, and programming, including probability, mathematical modeling, and experimental design with the ability to rigorously evaluate model performance with advanced proficiency in Python for data analysis, modeling, and visualization, and deep experience in PyTorch, TensorFlow, Hugging Face Transformers, scikit-learn, OpenCV, pandas, NumPy, matplotlib, and seaborn.
Hands-on experience with CNN and transformer architectures for document AI for image classification, transfer learning, and feature extraction; multimodal document understanding combining textual, visual, and layout features; and NLP models for text categorization, sequence labeling, named entity recognition, and semantic analysis with familiarity with additional computer vision models including object detection, image segmentation, and Vision Transformers.
Working experience with OCR technologies and image preprocessing, for text extraction from scanned documents, with an understanding of OCR accuracy metrics, preprocessing optimization, and error analysis. Proficiency in image preprocessing techniques for scanned documents in TIF/PNG format, including deskewing, binarization, resolution enhancement, noise removal, and multi-page document handling.
Hands-on experience with AWS SageMaker and Amazon Bedrock, including building, training, tuning, and deploying ML models in cloud-based production environments (notebook instances, training jobs, inference endpoints), as well as exploring foundation models and generative AI capabilities to augment document understanding and classification workflows and experience with containerized deployments on AWS EKS for productionizing data science models and analytical services at scale.
Proficiency in SQL with strong working knowledge of Oracle databases for complex data extraction, transformation, and analysis of document metadata and extracted content with working knowledge of Java and Groovy for collaborating with engineering teams and understanding enterprise application codebases and strong understanding of annotation tools, active learning strategies, and training data management for supervised learning in document AI use cases.

Preferred qualifications, capabilities, and skills

Domain expertise in the healthcare industry

You will drive Data Science projects, leveraging expertise to deliver innovative solutions for the Healthcare Provider team.

Data Scientist Lead

Full Job Description

SIMILAR OPPORTUNITIES