Enchanted Surrogates

A framework for creating databases for surrogate models of complex physics codes.

Machine learning surrogate model development requires large amounts of data, which is often generated using complex and computationally expensive simulation codes. The enchanted-surrogates package provides a flexible framework for creating databases for surrogate models of such complex physics codes. Database generation for a simulation consists of:

Running the code:
- Every code has it's own runtime entry points (e.g., I/O, actual execution) and computational resource needs
On a search space:
- e.g., hypercube, or efficiently searching across a space with active learning

i.e., step 1. is repeated many times to fill volume spanned by 2.

The idea is to abstract away the iterative process, and just uniquely handle 1. for each individual code, while being able to use mutliple searches types.

Notes

Some parts of this documentation is still under development.

Code structure

The Supervisor is the entry point or "the brain". The Supervisor reads the configurated parameters and initializes Sampler(s), Executor(s) and Runner(s) according to the user-defined configuration file.

The Sampler decides how the search space is traversed and returns samples to the Supervisor.

The user chooses the Executor based on the system where the code is running. The Executor initializes a cluster or a job queue or similar. The Supervisor sends the samples to the Executor. The Executor calls simulation_task.py which initializes a Runner for each sample.

A Runner is a code-specific module for running the code in question. Commonly paired with a code-specific Parser. A Parser is a code-specific module for reading and writing files produced or needed by the code. Code-specific Runners and Parsers are developed as plugins. See Plugins for available Runner + Parser combos. If a plugin for the code you are using doesn't exist yet, feel free to contribute with a new plugin! See Contribution.

The Supervisor keeps track of the samples and creates summary data structures to the specified base run directory. See documentation for Supervisor for all options and a graph about module structure.

How to install

Make sure you have a clean virtual environment with Python 3.10 or higher.

python -m venv .venv
source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`

Then clone the repository and install the package with pip:

git clone https://github.com/DIGIfusion/enchanted-surrogates.git
pip install -e enchanted-surrogates/

This will install the core package and its dependencies. In addition, you can install any plugins you want to use, by cloning their repositories and installing them with pip as well. See the Plugins section for more details.

Please note that some samplers require optional dependencies. Check the sampler's documentation to see if any optional dependencies are required to run. Optional dependencies can be installed by listing them inside square brackets, comma-separated without spaces, e.g.:

pip install -e enchanted-surrogates[bo,GPy,activelearning]

Note: in some environments, the command python may still point to system-wide Python (e.g. /usr/bin/python or /Library/Frameworks/...) rather than the virtual environment. You can check which python is active with:

which python
which python3

If neither is referring to created virtual environment, it can be referred with .venv/bin/python instead of python in the example below.

How to run

After installing the package and any desired plugins, you can use the command line interface to run simulations. For example, to run a simulation with the example runner and parser, you can use the following command:

python enchanted-surrogates/src/run.py -cf path/to/config/file

Make sure to replace path/to/config/file with the actual path to your configuration file. The configuration file should be in YAML format and specify the runners, samplers, executors, supervisor and other parameters needed for the simulation.

The configuration file should list all the executors, samplers and runners to be used. The supervisor run_order should then be specified for the desired workflow. For a simple, non-nested workflow, run_order contains only one executor, sampler and runner. For nested workflows, see Nested execution for more information.

logging: # NOTSET, DEBUG (default), INFO, WARNING, ERROR, CRITICAL

executors:
  e1:
    type: ...
samplers:
  s1:
    type: ...
runners:
  r1:
    type: ...
supervisor:
  base_run_dir: ...
  run_order:
    - executor: e1
      sampler: s1
      runner: r1

Output

The base_run_dir holds all the outputs from enchanted surrogates and its location is defined in the config file in the supervisor section. The framework will create a file structure as such:

base_run_dir/
├── data/
│    └── ...                  [All the run directories used by the physics codes]
├── logs/
│    └── main.log             [General log messages and errors] 
|    └── all_progress.txt     [Recording the sucess rate of each batch]
|    └── current_progress.txt [Recording the status and success rate for current batch]
├── config/
│    └── my_config.yaml       [The config file used for this enchanted surrogates run] 
├── enchanted_dataset.csv     [Summary file]
└── runs.h5                   [Summary file]

The summary files contain all the parsed outputs of the physics codes in one handy file for downstream AI/ML model training. The summary files are structured as such:

	param1	param2	paramN	output	success	run_dir
0	0.1	0.2	0.3	0.6	true	data_dir/example/data/d0_b0_r0_s0
1	0.1	0.2	0.3	0.6	true	data_dir/example/data/d0_b0_r1_s0
N	0.1	0.2	0.3	0.6	true	data_dir/example/data/d0_bn_rn_s0

All user defined sampled parameters are included for each sample. The runner output is defined as output. There is also a success field which is a boolean. Run directories are also included for clarity.

Note: Output files to be saved can be configured, see Configuring output files.

Quick start example

The following command runs the example local executor with the example configuration file. It creates a run directory in the current working directory, where it generates random samples and runs the example code.

python enchanted-surrogates/src/run.py -cf enchanted-surrogates/configs/example_local.yaml

About the project

License

Enchanted surrogates is distributed by an MIT license.

Citation

If you use this package in your research, please cite:

@Misc{enchanted-surrogates,
  title =        {Enchanted Surrogates: A flexible framework for surrogate modelling of fusion plasma simulations},
  author =       {Adam Kit and Amanda Bruncrona and Daniel Jordan and Aaro Järvinen and Anna Niemelä},
  howpublished = {Github},
  year =         {2025},
  url =          {https://github.com/DIGIfusion/enchanted-surrogates}
}

Acknowledgements

The development of this framework has been support by multiple funding sources:

Research Council of Finland project numbers: 355460, 358941.
EUROfusion Consortium, funded by the European Union via the Euratom Research and Training Programme (Grant Agreement No 1010522200 - EUROfusion) through the Advanced Computing Hub framework of the E-TASC program as well as dedicated machine learning projects, such as the project focused on surrogating pedestal MHD stability models.
Multiple CSC IT Center for Science projects have provided the necessary computing resources for the development and application of the framework.
Aalto University students Luka Jääskeläinen, Touko Seppä, Juha Vanhala, Samuel Hughes, Eetu Lindström, Anna Shcherbakova and Lauri Saksi contributed to the architectural restructuring, the addition of the supervisor module, and many other features introduced in v2.0.0 as part of the course CS-C2130 - Software Project 1.