LOG IN
SIGN UP
Canary Wharfian - Online Investment Banking & Finance Community.
Sign In
or continue with e-mail and password
Forgot password?
Don't have an account?
Create an account
or continue with e-mail and password
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Data Engineer - Commodities

ExperiencedNo visa sponsorship
Millennium logo

at Millennium

Hedge Funds

Posted 8 days ago

No clicks

**Data Engineer - Commodities** Design, implement, and maintain AWS-based ETL workflows using Python and SQL. Collaborate with quant researchers, data analysts, and the Commodities Tech team to structure and deliver critical commodities data, including weather, supply/demand, and storage datasets. Leverage Airflow for scheduling pipelines, Git for version control, and PyTest for automated testing. Previous experience in data engineering, Python/SQL proficiency, and familiarity with data warehousing technologies are required. Knowledge of commodities markets is preferred. Base salary range: $175,000 - $250,000 (New York-specific).

Compensation
$175,000 – $250,000 USD

Currency: $ (USD)

City
New York City
Country
United States

Full Job Description

Data Engineer - Commodities

The Commodities Technology team builds and operates the data platform that aggregates and curates critical commodities data, including weather, supply/demand, storage, transportation and other fundamental and alternative datasets. This curated content layer is central to how our Portfolio Managers and researchers understand markets and construct trades.

We are seeking a Commodities Content Engineer who will focus on building robust ETL workflows and data models on top of our commodities data platform.

In this role, you will use Python and SQL to design, implement and maintain pipelines that ingest, clean, transform, and catalog commodities datasets. You will work closely with quantitative researchers, data analysts, and the broader Commodities Technology team to translate domain requirements into wellstructured, reliable data assets that can be easily discovered and reused across strategies.

This is a handson engineering role with significant exposure to commodities data and the opportunity to shape how that data is represented and consumed across the firm.

Key Responsibilities:

  • Design and implement endtoend ETL workflows in Python and SQL to ingest and transform commodities data from multiple vendors and internal sources.
  • Build and maintain standardized data models, schemas, and metadata that make commodities datasets easy to understand and discover within the platform.
  • Use Airflow (or similar tools) to schedule, monitor, and manage data pipelines, ensuring reliability and timely delivery.
  • Implement robust validation, reconciliation, and anomalydetection checks to ensure data completeness, correctness, and consistency.
  • Leverage AI to automate schema inference across structured and semi-structured data sources, manage schema drift, and accelerate development of scalable ingestion pipelines.
  • Apply AI-driven data quality, observability, and documentation capabilities to detect anomalies, monitor data health, and generate clear lineage and technical documentation across complex data workflows.
  • Leverage Git, GitHub Actions, and automated testing (PyTest) to maintain highquality code and repeatable deployments.
  • Partner with commodities PMs, researchers, and data strategists to understand use cases and continuously refine datasets, definitions, and documentation.

Required Qualifications:

  • 4 years of experience in data engineering, analytics engineering, or similar roles focused on building and maintaining ETL pipelines.
  • Strong skills in Python and SQL, with experience working with large datasets and complex transformations.
  • Handson experience with Airflow or other workflow schedulers.
  • Familiarity with version control (Git), CI/CD pipelines (GitHub Actions or equivalent), and test automation (e.g., PyTest).
  • Strong attention to detail, data quality and documentation; ability to reason for edge cases and data integrity.
  • Ability to work independently, communicate clearly with both technical and nontechnical stakeholders, and manage work across multiple concurrent initiatives.

Preferred Qualifications:

  • Knowledge of commodities markets and commodities data (e.g., weather, supply/demand, storage, freight, flows).
  • Experience with data warehousing technologies (e.g., Snowflake, columnar storage formats, or analytic databases).
  • Prior experience in a financial services, trading, or research driven environment.
  • Exposure to data catalog / data governance tools and best practices.

The estimated base salary range for this position is $175,000 to $250,000, which is specific to New York and may change in the future. Millennium pays a total compensation package which includes a base salary, discretionary performance bonus, and a comprehensive benefits package. When finalizing an offer, we take into consideration an individuals experience level and the qualifications they bring to the role to formulate a competitive total compensation package.

Data Engineer - Commodities

Compensation

$175,000 – $250,000 USD

City: New York City

Country: United States

Millennium logo
Hedge Funds

8 days ago

No clicks

at Millennium

ExperiencedNo visa sponsorship

**Data Engineer - Commodities** Design, implement, and maintain AWS-based ETL workflows using Python and SQL. Collaborate with quant researchers, data analysts, and the Commodities Tech team to structure and deliver critical commodities data, including weather, supply/demand, and storage datasets. Leverage Airflow for scheduling pipelines, Git for version control, and PyTest for automated testing. Previous experience in data engineering, Python/SQL proficiency, and familiarity with data warehousing technologies are required. Knowledge of commodities markets is preferred. Base salary range: $175,000 - $250,000 (New York-specific).

Full Job Description

Data Engineer - Commodities

The Commodities Technology team builds and operates the data platform that aggregates and curates critical commodities data, including weather, supply/demand, storage, transportation and other fundamental and alternative datasets. This curated content layer is central to how our Portfolio Managers and researchers understand markets and construct trades.

We are seeking a Commodities Content Engineer who will focus on building robust ETL workflows and data models on top of our commodities data platform.

In this role, you will use Python and SQL to design, implement and maintain pipelines that ingest, clean, transform, and catalog commodities datasets. You will work closely with quantitative researchers, data analysts, and the broader Commodities Technology team to translate domain requirements into wellstructured, reliable data assets that can be easily discovered and reused across strategies.

This is a handson engineering role with significant exposure to commodities data and the opportunity to shape how that data is represented and consumed across the firm.

Key Responsibilities:

  • Design and implement endtoend ETL workflows in Python and SQL to ingest and transform commodities data from multiple vendors and internal sources.
  • Build and maintain standardized data models, schemas, and metadata that make commodities datasets easy to understand and discover within the platform.
  • Use Airflow (or similar tools) to schedule, monitor, and manage data pipelines, ensuring reliability and timely delivery.
  • Implement robust validation, reconciliation, and anomalydetection checks to ensure data completeness, correctness, and consistency.
  • Leverage AI to automate schema inference across structured and semi-structured data sources, manage schema drift, and accelerate development of scalable ingestion pipelines.
  • Apply AI-driven data quality, observability, and documentation capabilities to detect anomalies, monitor data health, and generate clear lineage and technical documentation across complex data workflows.
  • Leverage Git, GitHub Actions, and automated testing (PyTest) to maintain highquality code and repeatable deployments.
  • Partner with commodities PMs, researchers, and data strategists to understand use cases and continuously refine datasets, definitions, and documentation.

Required Qualifications:

  • 4 years of experience in data engineering, analytics engineering, or similar roles focused on building and maintaining ETL pipelines.
  • Strong skills in Python and SQL, with experience working with large datasets and complex transformations.
  • Handson experience with Airflow or other workflow schedulers.
  • Familiarity with version control (Git), CI/CD pipelines (GitHub Actions or equivalent), and test automation (e.g., PyTest).
  • Strong attention to detail, data quality and documentation; ability to reason for edge cases and data integrity.
  • Ability to work independently, communicate clearly with both technical and nontechnical stakeholders, and manage work across multiple concurrent initiatives.

Preferred Qualifications:

  • Knowledge of commodities markets and commodities data (e.g., weather, supply/demand, storage, freight, flows).
  • Experience with data warehousing technologies (e.g., Snowflake, columnar storage formats, or analytic databases).
  • Prior experience in a financial services, trading, or research driven environment.
  • Exposure to data catalog / data governance tools and best practices.

The estimated base salary range for this position is $175,000 to $250,000, which is specific to New York and may change in the future. Millennium pays a total compensation package which includes a base salary, discretionary performance bonus, and a comprehensive benefits package. When finalizing an offer, we take into consideration an individuals experience level and the qualifications they bring to the role to formulate a competitive total compensation package.