Publication of Omnifold weights

Description

Standardizing the publication, preservation, and reuse of ML-based unfolding results

This project is an effort to define how machine-learning–based unfolding results (in particular OmniFold) should be published in a way that is reusable, and compatible with HEPData.

Modern unfolding methods produce per-event weights and trained models, rather than fixed binned histograms. While this enables flexibility and reinterpretation, it also raises new challenges for publication, reproducibility, and long-term usability. Our goal is to try and define a coherent set of specifications, tools, and reference implementations to address those challenges.

The project will be based on the public, open-source Omnifold repository, which contains examples of what omnifold weights are.Using that as a starting point, we want this project to define:

Task ideas

1. Data & Metadata Specification

2. Per-Event Weights

3. Model & Training Details

4. Observable Definitions

5. Analysis & Reinterpretation

6. Validation

7. HEPData Integration Layer (Additional stretch goal)

Merge OmniFold outputs with HEPData infrastructure:

8. Examples & Reference Implementations

Expected results

By the end of the project, a Python package will be created that produces OmniFold weights in a standardized format, accompanied by data and metadata specifications for per-event weights, dataset selections, and optional model information. The project will also provide user-facing tools to apply weights, compute observables, and generate publication-quality plots, along with reference examples demonstrating integrating with HEPData.

Requirements

How to Apply

Email mentors with a brief background and interest in scientific software stacks and high-energy physics. Please include “gsoc26” in the subject line. Mentors will provide an evaluation task after submission.

Mentors

Additional Information

Corresponding Project

Participating Organizations