Skip to content

Bayesian Optimization Sampler


Overview

This sampler Class uses Bayesian Optimization techniques to data efficiently sample through the search space to yield optimial information gain as specified by the acquisition strategy. The model will try to maximize the objective. Bayesian Optimization sampler using the BoTorch library.


BayesianOptimizationSampler

BayesianOptimizationSampler(*args, **kwargs)

Bases: Sampler

Bayesian optimization sampler using BoTorch.

The sampler follows the same lifecycle as the active-learning sampler: it owns its observation state and only exposes the three external methods expected by the orchestration layer: __init__, get_next_samples, and register_future.

Note

This sampler requires the bo optional dependency to function. See installation guide for more details.

To use the BayesianOptimizationSampler, specify it in the configuration file as follows:

sampler:
    type: BayesianOptimizationSampler
    budget: 50
    initial_samples: 20
    acquisition_batch_size: 10
    acquisition_function: qEI
    random_fraction: 0.2
    bounds: [[0.0, 1.0], [1.0, 5.0]]
    parameters: ['x', 'y']
    observations: ['distance']
    base_run_dir: ./runs
    fully_bayesian: false
    async_samp: false
    failure_prob_filter: false
    ucb_beta: 2.0
    parser: Parser
    parser_config:
    key: value

Attributes:

Name Type Description
initial_samples int

Number of initial samples required.

verbose bool

Whether to print verbose output.

fully_bayesian bool

Whether to use fully Bayesian models.

acquisition_batch_size int

Number of samples in each acquisition batch.

observations list

List of observations.

bounds list

Bounds for the search space.

acquisition_function str

Acquisition function to use.

random_fraction float

Fraction of random samples.

failure_prob_filter bool

Whether to filter based on failure probability.

ucb_beta float

Beta parameter for UCB acquisition function.

async_samp bool

Whether to use asynchronous sampling.

parameters list

List of parameter names.

parser type

Parser type for collecting sample information.

parser_config

Parser kwargs


Assumptions and notes

  • The sampler assumes continuous numeric parameters and bounded search spaces.

  • Bayesian optimization relies on existing evaluation results stored in base_run_dir.

  • Sampling proceeds in two phases: Random sampling until initial_samples are collected. Model-based sampling using a Gaussian Process surrogate.


Configuration

bounds : list[tuple[float, float]] Per-parameter lower and upper bounds. parameters : list[str] Parameter names in the same order as bounds. budget : int, optional Total number of samples allowed. Stored for bookkeeping. initial_samples : int, optional Number of random samples to return before surrogate-driven sampling. acquisition_batch_size : int, optional Number of candidates returned per call once the surrogate is active. acquisition_function : str, optional One of: qLEI, qUCB, qEI, qPI, EI, LEI. random_fraction : float, optional Fraction of each batch reserved for random exploration after warm-up. async_samp : bool, optional If true, a call may return a single random point with probability random_fraction, otherwise it behaves like synchronous acquisition. failure_prob_filter : bool, optional If true and failure data are available, proposed candidates are filtered by a secondary failure model. parser : type or str, optional Parser used to reconstruct old run-directory results when base_run_dir is supplied. parser_config : dict, optional Keyword arguments forwarded to parser construction. observations : list[str], optional Observation field names. Multiple fields are summed into the scalar target used by the GP, matching the legacy behavior.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def __init__(
    self,
    *args,
    **kwargs,
):
    """

    Configuration
    -------------
    bounds : list[tuple[float, float]]
        Per-parameter lower and upper bounds.
    parameters : list[str]
        Parameter names in the same order as `bounds`.
    budget : int, optional
        Total number of samples allowed. Stored for bookkeeping.
    initial_samples : int, optional
        Number of random samples to return before surrogate-driven sampling.
    acquisition_batch_size : int, optional
        Number of candidates returned per call once the surrogate is active.
    acquisition_function : str, optional
        One of: `qLEI`, `qUCB`, `qEI`, `qPI`, `EI`, `LEI`.
    random_fraction : float, optional
        Fraction of each batch reserved for random exploration after warm-up.
    async_samp : bool, optional
        If true, a call may return a single random point with probability
        `random_fraction`, otherwise it behaves like synchronous acquisition.
    failure_prob_filter : bool, optional
        If true and failure data are available, proposed candidates are filtered
        by a secondary failure model.
    parser : type or str, optional
        Parser used to reconstruct old run-directory results when `base_run_dir`
        is supplied.
    parser_config : dict, optional
        Keyword arguments forwarded to parser construction.
    observations : list[str], optional
        Observation field names. Multiple fields are summed into the scalar
        target used by the GP, matching the legacy behavior.

    """

    log.info("INITIALISING BAYESIAN OPTIMIZATION SAMPLER")

    self.base_run_dir = kwargs.get("base_run_dir", "")
    self._budget = kwargs.get("budget", 20)
    self.bounds = kwargs.get("bounds", [None])
    self.parameters = kwargs.get("parameters", [])

    self.observations = kwargs.get("observations", [None])
    self.parser_type = kwargs.get("parser", None)
    self.parser_config = kwargs.get("parser_config", {}) or {}

    self.initial_samples = kwargs.get("initial_samples", 50)
    self.acq_batch_size = kwargs.get("acquisition_batch_size", 20)
    self.acq_function = kwargs.get("acquisition_function", "qLEI")
    self.random_fraction = kwargs.get("random_fraction", 0.2)
    self.fail_p_filter = kwargs.get("failure_prob_filter", False)
    self.ucb_beta = kwargs.get("ucb_beta", 2.0)
    self.async_samp = kwargs.get("async_samp", False)

    self.fully_bayesian = kwargs.get("fully_bayesian", False)
    self.covar = kwargs.get("covar_kernel", "Matern-3/2")

    self.verbose = kwargs.get("verbose", False)

    # Legacy flags are kept as no-ops for compatibility.
    self.plot_GPR_flag = kwargs.get("plot_GPR", False)
    self.plot_GPR_file = kwargs.get("plot_file", False)
    self.plot_frequency = kwargs.get("plot_frequency", 1)
    self.plot_debug = kwargs.get("plot_debug", False)
    self.plot_progress = kwargs.get("plot_progress", False)
    self.plot_labels = kwargs.get("plot_labels", None)
    self.GPR_plot_dim = kwargs.get("GPR_plot_dim", [0])

    self.submitted = 0
    self.futures = []

    self.parser = None
    if self.parser_type is not None:
        self.parser = import_parser(self.parser_type, self.parser_config)

    # Internal state used by the new workflow.
    self.X_obs = np.zeros((0, len(self.bounds)), dtype=float)
    self.y_obs = np.zeros((0,), dtype=float)
    self.X_failed = np.zeros((0, len(self.bounds)), dtype=float)
    self.y_failed = np.zeros((0,), dtype=float)

    # Backward-compatible state used by the old GP pipeline.
    self.result_dictionary = [None]
    self.result_dictionary_failed = [None]
    self.model = None
    self.model_failed = None
    self.best_f = None
    self.best_f_loc = None

    self.seen_run_dirs: set[str] = set()

append_observation

append_observation(params, y, failure=0)

Append a single observation to the in-memory dataset.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
def append_observation(self, params: Mapping[str, Any], y: Any, failure: Any = 0):
    """Append a single observation to the in-memory dataset."""
    if "params" in params and isinstance(params["params"], Mapping):
        params = params["params"]

    x = np.array(
        [self._to_scalar(params[p]) for p in self.parameters], dtype=float
    ).reshape(1, -1)
    y_scalar = self._to_scalar(y)

    if failure not in (0, 0.0, False, None):
        self.X_failed = np.vstack([self.X_failed, x])
        self.y_failed = np.append(self.y_failed, y_scalar)
    else:
        self.X_obs = np.vstack([self.X_obs, x])
        self.y_obs = np.append(self.y_obs, y_scalar)

apply_failure_filter

apply_failure_filter(candidates, acq)

Filter candidates using the failure-probability model.

This preserves the legacy idea of rejecting points likely to fail while keeping the logic private to the sampler.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
def apply_failure_filter(self, candidates: torch.Tensor, acq) -> torch.Tensor:
    """
    Filter candidates using the failure-probability model.

    This preserves the legacy idea of rejecting points likely to fail while
    keeping the logic private to the sampler.
    """
    if self.model_failed is None or len(candidates) == 0:
        return candidates

    bounds = torch.tensor(self.bounds, dtype=torch.float64)
    accepted = []
    target_len = candidates.size(dim=0)

    norm_inp = normalize(candidates, bounds.T)
    pred = self.model_failed(norm_inp).mean.squeeze(-1)

    for i in range(len(pred)):
        if torch.rand(1).item() > float(pred[i]):
            accepted.append(candidates[i, :].detach().cpu().numpy())

    if len(accepted) < target_len:
        refill = self.optimize_candidates(acq, qval=target_len - len(accepted))
        accepted_tensor = (
            torch.tensor(accepted, dtype=torch.float64)
            if accepted
            else torch.empty((0, len(self.bounds)), dtype=torch.float64)
        )
        if len(refill) > 0:
            return torch.cat([accepted_tensor, refill], dim=0)
        return accepted_tensor

    return torch.tensor(accepted, dtype=torch.float64)

build_acquisition

build_acquisition()

Build the configured ByTorch acquisition function from the trained model

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
def build_acquisition(self):
    """
    Build the configured ByTorch acquisition function from the trained model
    """
    if self.model is None or self.best_f is None:
        raise RuntimeError("Surroage model has not been trained")

    match self.acq_function:
        case "qLEI":
            return botorch.acquisition.qLogExpectedImprovement(
                model=self.model, best_f=self.best_f
            )
        case "qUCB":
            return botorch.acquisition.qUpperConfidenceBound(
                model=self.model, beta=self.ucb_beta
            )
        case "qEI":
            return botorch.acquisition.qExpectedImprovement(
                model=self.model, best_f=self.best_f
            )
        case "qPI":
            return botorch.acquisition.qProbabilityOfImprovement(
                model=self.model, best_f=self.best_f
            )
        case "EI":
            return botorch.acquisition.ExpectedImprovement(
                model=self.model, best_f=self.best_f
            )
        case "LEI":
            return botorch.acquisition.LogExpectedImprovement(
                model=self.model, best_f=self.best_f
            )
        case _:
            raise ValueError(
                "Unsupported acquisition function:", f"{self.acq_function}"
            )

extract_target_from_mapping

extract_target_from_mapping(payload)

Reduce a mapping to a scalar target value.

The legacy sampler used distances and summed over their components. The new workflow keeps that convention.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
def extract_target_from_mapping(self, payload: Mapping[str, Any]) -> float:
    """
    Reduce a mapping to a scalar target value.

    The legacy sampler used `distances` and summed over their components.
    The new workflow keeps that convention.
    """
    if len(self.observations) == 1 and self.observations[0] in payload:
        return float(
            np.asarray(payload[self.observations[0]], dtype=float).squeeze()
        )

    values = []
    for key in self.observations:
        if key in payload:
            values.append(np.asarray(payload[key], dtype=float))
    if values:
        return float(np.sum(values))
    if "output" in payload:
        output = payload["output"]

        # Handle scalar output
        if not isinstance(output, Mapping):
            return float(np.asarray(output, dtype=float).squeeze())

        # output is a mapping
        if len(self.observations) == 1 and self.observations[0] in output:
            return float(
                np.asarray(output[self.observations[0]], dtype=float).squeeze()
            )

        nested_values = []
        for key in self.observations:
            if key in output:
                nested_values.append(np.asarray(output[key], dtype=float))

        if nested_values:
            return float(np.sum(nested_values))

        numeric_leaves = []
        for value in output.values():
            try:
                numeric_leaves.append(np.asarray(value, dtype=float))
            except Exception:
                continue

        if numeric_leaves:
            return float(np.sum(numeric_leaves))
    raise KeyError(
        "Could not determine target value from payload. Provide an `y` key"
        " or one or more observation fields named in `observations`."
    )

get_next_samples

get_next_samples()

Return the next batch of candidate parameter dictionaries.

The sampler first refreshes its internal state from any completed futures and, if configured, from the legacy run directory parser path. It then either: - returns random warm-up points until initial_samples observations are available, or - fits the surrogate and acquires new candidates with BoTorch.

Returns

list[dict[str, float]] Batch of parameter dictionaries in the order expected by the orchestration layer.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
def get_next_samples(self):
    """
    Return the next batch of candidate parameter dictionaries.

    The sampler first refreshes its internal state from any completed
    futures and, if configured, from the legacy run directory parser
    path. It then either:
    - returns random warm-up points until `initial_samples` observations
      are available, or
    - fits the surrogate and acquires new candidates with BoTorch.

    Returns
    -------
    list[dict[str, float]]
        Batch of parameter dictionaries in the order expected by the
        orchestration layer.
    """
    self.sync_from_run_directory()

    if self._n_observations() < self.initial_samples:
        return self.get_random_samples(self.acq_batch_size)

    self.train_surrogate()
    acq = self.build_acquisition()

    batch_samples: list[dict[str, float]] = []

    # Synchronous mode keeps a small random exploration fraction in the batch.
    if self.async_samp:
        if torch.rand(1).item() < self.random_fraction:
            batch_samples = self.get_random_samples(1)
            self.submitted += len(batch_samples)
            return batch_samples
        qval = 1
    else:
        random_count = int(self.random_fraction * self.acq_batch_size)
        model_count = max(
            int((1 - self.random_fraction) * self.acq_batch_size), 1)
        batch_samples.extend(self.get_random_samples(random_count))
        qval = model_count

    candidates = self.optimize_candidates(acq, qval=qval)
    if self.fail_p_filter and self.model_failed is not None:
        candidates = self.apply_failure_filter(candidates, acq)

    for row in candidates:
        batch_samples.append(
            {p: float(v) for p, v in zip(self.parameters, row)})

    self.submitted += len(batch_samples)
    return batch_samples

get_random_samples

get_random_samples(n)

Generate n random samples uniformly within the configured bounds.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
def get_random_samples(self, n: int) -> list[dict[str, float]]:
    """Generate `n` random samples uniformly within the configured bounds."""
    if n <= 0:
        return []

    samples = []

    for _ in range(n):
        params = [
            torch.distributions.Uniform(lb, ub).sample().item()
            for (lb, ub) in self.bounds
        ]
        param_dict = dict(zip(self.parameters, params))
        samples.append(param_dict)

    return samples

ingest_future

ingest_future(future)

convert one completed result payload into internal arrays.

this method accepts several lightweight shapes so the orchestration layer can pass whatever it already has available.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
def ingest_future(self, future: any):
    """
    convert one completed result payload into internal arrays.

    this method accepts several lightweight shapes so the orchestration
    layer can pass whatever it already has available.
    """
    if isinstance(future, (list, tuple)):
        for item in future:
            self.ingest_future(item)
        return

    if hasattr(future, "to_dict") and not isinstance(future, dict):
        # works for pandas series/dataframe-like rows.
        try:
            future = future.to_dict()
        except Exception:
            future = future

    if isinstance(future, dict):
        if "params" in future and "y" in future:
            params = future["params"]
            y_val = future["y"]
            failure = future.get("failure", 0)
            self.append_observation(params, y_val, failure=failure)
            return

        if all(param in future for param in self.parameters):
            params = {k: future[k] for k in self.parameters}
            y_val = self.extract_target_from_mapping(future)
            failure = future.get("failure", 0)
            self.append_observation(params, y_val, failure=failure)
            return

    raise TypeError(
        "unsupported future payload. expected a mapping with params/y, "
        "parameter columns plus observation fields, or an iterable of those."
    )

ingest_sample_dict

ingest_sample_dict(sample_dict, run_dir=None)

Ingest a sample dictionary returned by the legacy parser.

The parser path usually returns keys like inputs, distances, failure, and run_dir. The observation values are reduced to a scalar by summing across the distance vector, which preserves the old sampler behavior.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
def ingest_sample_dict(
    self, sample_dict: Mapping[str, Any], run_dir: str | None = None
):
    """
    Ingest a sample dictionary returned by the legacy parser.

    The parser path usually returns keys like `inputs`, `distances`, `failure`,
    and `run_dir`. The observation values are reduced to a scalar by summing
    across the distance vector, which preserves the old sampler behavior.
    """
    if sample_dict is None:
        return

    if sample_dict.get("failure", 0) not in (0, 0.0, False, None):
        x = sample_dict.get("inputs", None)
        y = sample_dict.get("objective", None)
        if x is None or y is None:
            return
        x_arr = np.asarray(x, dtype=float)
        if x_arr.ndim == 1:
            x_arr = x_arr.reshape(1, -1)
        y_arr = np.asarray(y, dtype=float)
        y_scalar = float(np.sum(y_arr))
        self.X_failed = np.vstack([self.X_failed, x_arr.reshape(1, -1)])
        self.y_failed = np.append(self.y_failed, y_scalar)
        return

    x = sample_dict.get("inputs", None)
    y = sample_dict.get("objective", None)
    if x is None or y is None:
        return

    x_arr = np.asarray(x, dtype=float)
    if x_arr.ndim == 1:
        x_arr = x_arr.reshape(1, -1)

    y_arr = np.asarray(y, dtype=float)
    y_scalar = float(np.sum(y_arr))

    self.X_obs = np.vstack([self.X_obs, x_arr.reshape(1, -1)])
    self.y_obs = np.append(self.y_obs, y_scalar)

optimize_candidates

optimize_candidates(acq, qval)

Optimize the acquisition function over the bounded domain.

Returns

torch.Tensor Candidate tensor in the original parameter space.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
def optimize_candidates(self, acq, qval: int) -> torch.Tensor:
    """
    Optimize the acquisition function over the bounded domain.

    Returns
    -------
    torch.Tensor
        Candidate tensor in the original parameter space.
    """
    if qval <= 0:
        return torch.empty((0, len(self.bounds)), dtype=torch.float64)

    lower_bound = torch.zeros(len(self.bounds), dtype=torch.float64).unsqueeze(0)
    upper_bound = torch.ones(len(self.bounds), dtype=torch.float64).unsqueeze(0)
    boundtensor = torch.cat((lower_bound, upper_bound))

    candidates, _ = optimize_acqf(
        acq,
        bounds=boundtensor,
        sequential=False,
        q=qval,
        num_restarts=10,
        raw_samples=1024,
    )

    bounds = torch.tensor(self.bounds, dtype=torch.float64)
    return unnormalize(candidates, bounds.T)

refresh_result_dictionaries

refresh_result_dictionaries()

Keep legacy result dictionary attributes in sync with the internal arrays.

This allows any downstream code that still inspects these attributes to continue working, even though the public workflow no longer calls the original builder directly.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
def refresh_result_dictionaries(self):
    """
    Keep legacy result dictionary attributes in sync with the internal
    arrays.

    This allows any downstream code that still inspects these attributes to
    continue working, even though the public workflow no longer calls the
    original builder directly.
    """
    if len(self.X_obs) > 0:
        self.result_dictionary = {
            "inputs": self.X_obs.tolist(),
            "objective": [[float(v)] for v in self.y_obs.tolist()],
            "failure": [0 for _ in range(len(self.y_obs))],
        }
    else:
        self.result_dictionary = [None]

    if len(self.X_failed) > 0:
        self.result_dictionary_failed = {
            "inputs": self.X_failed.tolist(),
            "objective": [[float(v)] for v in self.y_failed.tolist()],
            "failure": [1 for _ in range(len(self.y_failed))],
        }
    else:
        self.result_dictionary_failed = [None]

register_future

register_future(future)

Register a completed evaluation.

Accepted inputs
  • mapping with keys params and y
  • mapping containing parameter columns directly plus one or more observation columns
  • dataframe-like object with parameter columns and observation columns
  • a list/tuple of the above, which will be ingested item by item
Notes

The sampler stores observations internally and also keeps a small compatibility layer that mirrors the legacy result_dictionary fields. If a failure field is present and evaluates truthy, the record is routed to the failure model dataset as well.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
def register_future(self, future):
    """
    Register a completed evaluation.

    Accepted inputs
    ---------------
    - mapping with keys `params` and `y`
    - mapping containing parameter columns directly plus one or more
      observation columns
    - dataframe-like object with parameter columns and observation columns
    - a list/tuple of the above, which will be ingested item by item

    Notes
    -----
    The sampler stores observations internally and also keeps a small
    compatibility layer that mirrors the legacy `result_dictionary` fields.
    If a `failure` field is present and evaluates truthy, the record is
    routed to the failure model dataset as well.
    """
    if future is None or len(future) == 0:
        return
    self.futures.append(future)
    self.ingest_future(future)
    self.refresh_result_dictionaries()

sync_from_run_directory

sync_from_run_directory()

Rebuild the internal dataset from the legacy run directory path.

This keeps the old parser-based workflow usable without making it a public method. It only runs when both a parser and base_run_dir are available.

Source code in src/enchanted_surrogates/samplers/bayesian_optimization_sampler.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
def sync_from_run_directory(self):
    """
    Rebuild the internal dataset from the legacy run directory path.

    This keeps the old parser-based workflow usable without making it a
    public method. It only runs when both a parser and `base_run_dir` are
    available.
    """
    if self.parser is None or not self.base_run_dir:
        return
    if not os.path.isdir(self.base_run_dir):
        return

    try:
        dirlist = os.listdir(self.base_run_dir)
    except OSError:
        return

    skiplist = [
        "yaml",
        "worker_out",
        "FINISHED",
        ".pkl",
        ".csv",
        "_RUN",
        "GPR",
        "Fig",
    ]

    changed = False
    for dirname in dirlist:
        if any(tag in dirname for tag in skiplist):
            continue

        run_dir = os.path.join(self.base_run_dir, dirname)
        if run_dir in self.seen_run_dirs:
            continue

        try:
            sample_dict = self.parser.collect_sample_information(
                run_dir,
                self.observations,
            )
        except Exception as exc:  # pragma: no cover - defensive guard
            log.warning("Could not parse run directory %s: %s", run_dir, exc)
            continue

        self.seen_run_dirs.add(run_dir)
        self.ingest_sample_dict(sample_dict, run_dir=run_dir)
        changed = True

    if changed:
        self.refresh_result_dictionaries()