Configuration Guide#

The thesis framework is driven entirely by YAML configuration validated with Pydantic v2. Each section model sets extra="forbid", so an unknown key inside a known section raises a ConfigurationError. The root PipelineConfig uses extra="allow" so workflow-registered namespaces are recognised, but any unknown top-level key that isn’t a registered namespace is still rejected. This page covers the merge order, path expansion rules, code-level access, and links to a complete per-section reference of every key.

The authoritative schema lives in src/thesis/core/config/validators.py. If a key isn’t documented in the pages below, look there — the docs aim to mirror the Pydantic models exactly.

Merge order#

Configuration values come from five sources, deep-merged in order (later sources override earlier ones, nested keys preserved):

  1. config/<config_name>.yaml — the base config selected by -c/--config (default: default). Common bases: default, hcp, mrtrix3, tract_synthseg, preprocess. This file is the entire base layer: passing -c mrtrix3 loads config/mrtrix3.yaml as the base and replaces default.yaml rather than merging on top of it. These are the flat, root-level config/*.yaml files.

  2. config/hardware.yaml — machine-specific resource overrides (threads, memory, GPU). Gitignored — copy from hardware.example.yaml.

  3. config/protocols/{protocol}.yaml — workflow/protocol-specific parameters (e.g. hcp.yaml, mrtrix3.yaml, full_pipeline.yaml). Selected via --protocol or the workflow’s WorkflowEntry.default_protocol. This is a separate set of files from the Level-1 bases: config/hcp.yaml (a root-level -c base) and config/protocols/hcp.yaml (a --protocol overlay) are two different files that both exist in the repo. A flat base typically pulls in its matching overlay by setting a protocol: key.

  4. config/patients/{patient_id}.yaml — per-subject overrides (also a place to pin patient_id and protocol).

  5. CLI flags — runtime overrides (--hemisphere, --raw-data-dir, etc.; see the CLI overrides table).

The merging is performed by ConfigManager.load_config() (see src/thesis/core/config/manager.py). Invalid merged configs raise ConfigurationError with the underlying Pydantic validation error attached.

Path syntax#

Every path field in PathConfig, NipypeConfig, and SynthSegConfig is processed by thesis.core.utils.to_path() (see src/thesis/core/utils/utils.py). The function applies, in order:

  • ~ expansion via os.path.expanduser (e.g. ~/thesis_data/home/$USER/thesis_data).

  • $VAR and ${VAR} expansion via os.path.expandvars (e.g. $DATA_ROOT/processed).

  • Conversion to pathlib.Path. Relative paths stay relative to the current working directory unless resolved by resolve_path() against a base.

Example structure#

The root-level config/*.yaml files are the Level-1 bases reached with -c/--config; config/protocols/*.yaml are the Level-3 overlays reached with --protocol (or a workflow’s default_protocol). The same name can appear in both places (e.g. config/hcp.yaml and config/protocols/hcp.yaml) — they are distinct files at different merge levels. Run thesis list-configs for the live list; the tree below is a snapshot for orientation.

config/
├── default.yaml                  # Level 1 base (-c default) — global defaults
├── hardware.yaml                 # Level 2 (gitignored; copy from hardware.example.yaml)
├── cloud.yaml                    # Level 1 base — cloud/Docker preset
├── hcp.yaml                      # Level 1 base (-c hcp)
├── mrtrix3.yaml                  # Level 1 base (-c mrtrix3)
├── mrtrix3_atlas.yaml            # Level 1 base (-c mrtrix3_atlas)
├── preprocess.yaml               # Level 1 base (-c preprocess)
├── tract_similarity.yaml         # Level 1 base (-c tract_similarity)
├── tract_similarity_sweep.yaml   # Level 1 base (-c tract_similarity_sweep)
├── tract_synthseg.yaml           # Level 1 base (-c tract_synthseg)
├── tract_synthseg_mrtrix3.yaml   # Level 1 base (-c tract_synthseg_mrtrix3)
├── *.example.yaml                # committed templates (default/hardware/cloud/mrtrix3_atlas/...)
├── protocols/                    # Level 3 overlays (--protocol / default_protocol)
│   ├── hcp.yaml
│   ├── mrtrix3.yaml
│   ├── tract_synthseg.yaml
│   ├── tract_synthseg_mrtrix3.yaml
│   ├── full_pipeline.yaml
│   ├── full_pipeline_mrtrix3.yaml                 # sets tractography.method: mrtrix3 (-w full_pipeline)
│   ├── full_pipeline_probtrackx2_after_mrtrix3.yaml
│   ├── _full_pipeline_short.yaml
│   ├── hcp.example.yaml / standard.example.yaml   # committed templates
└── patients/                     # Level 4 per-subject overrides
    ├── patient.example.yaml
    ├── 114823.yaml
    ├── 124220.yaml
    ├── K2_Clinical.yaml
    ├── K2_HARDI.yaml
    └── LDP001.yaml

Loading configuration from Python#

from thesis.core.config import ConfigManager

mgr = ConfigManager(config_dir="config")

# Level 1 only
cfg = mgr.load_config(config_name="default")

# Levels 1+3+4 (+5 via overrides)
cfg = mgr.load_config(
    config_name="default",
    patient_id="114823",
    protocol="hcp",
    overrides={"tractography": {"n_samples": 10000}},
)

print(cfg.hardware.threads)            # int
print(cfg.tractography.method)         # 'probtrackx2' | 'mrtrix3' | ...

The CLI does this for every thesis run invocation, plus it adds CLI-derived overrides for --hemisphere, --raw-data-dir, --output-dir, and runtime GPU validation.

Inspecting the merged config#

thesis show-config default       # print the merged level-1 config as YAML
thesis list-configs              # list every YAML file ConfigManager can see
thesis list-configs --subdir patients

Patient-specific overrides#

Create one file per subject in config/patients/, named with the patient ID:

# config/patients/114823.yaml
patient_id: "114823"
protocol: hcp                  # Optional — overrides WorkflowEntry.default_protocol

tractography:
  n_samples: 10000             # Only override what differs from the protocol
  mem_gb_gpu: 12.0

hcp:
  t1_image: "T1w/T1w_acpc_dc_restore_1.25.nii.gz"

CLI overrides#

These CLI flags map directly to config keys (CLI wins):

CLI flag

Config key

Notes

-c / --config

(selects the Level-1 base file)

Names the root-level config/<name>.yaml base; default default. See Merge order Level 1.

--protocol

(selects the Level-3 overlay)

Names the config/protocols/<name>.yaml overlay; falls back to the workflow’s default_protocol when omitted. See Merge order Level 3.

--hemisphere

tractography.hemisphere

left / right / both / both-separately

--raw-data-dir

paths.inputs_dir

-j / --max-workers

nipype.plugin_args.n_procs

Implies --parallel

-v / --verbose

output.verbosity = "verbose"

Also bumps log level to DEBUG

-q / --quiet

output.verbosity = "quiet"

--summary

output.summary

off / compact / full

--no-progress

output.progress = "off"

--log-level

(logger only — not a config field)

DEBUG/INFO/WARNING/ERROR

Which sections does my workflow use?#

Each workflow reads only a subset of the sections below. Use this as a jump-off point to the pages that matter for your run (every workflow also reads paths, hardware, nipype, and output, which are omitted here for brevity):

Workflow

Scope

Config sections consumed

hcp

patient

tractography, hcp, transforms, qc

mrtrix3

patient

tractography, hcp, qc

tract_synthseg

patient

tractography, hcp, synthseg, transforms, qc

full_pipeline

patient

preprocessing, preprocess, registration, tractography, transforms, tract_similarity, atlas, qc

preprocess

patient

preprocessing, preprocess, registration, segmentation, synthseg

synthseg

patient

synthseg

registration

patient

registration, hcp

transform

patient

transforms, registration

atlas_to_patient

patient

transforms, registration

tract_similarity

patient

tract_similarity, atlas, qc

qc

patient

qc, tractography, atlas

minimal

patient

(common sections only)

atlas

cohort

atlas, atlas_qc

learned_atlas

cohort

learned_atlas, atlas

tract_similarity_cohort

cohort

tract_similarity, qc

tract_similarity_hcp_loo

cohort

tract_similarity, atlas

tract_similarity_sweep

cohort

tract_similarity, tract_similarity_sweep

All 17 registered workflows are listed. The five cohort-scope workflows ignore -p/--patient-id and --all; they operate on the configured output directory.

Per-section reference#

Every top-level section of PipelineConfig has a dedicated reference page. Each lists every field with name, type, default, validator constraints, and a one-line description.

Best practices#

  1. Keep default.yaml minimal. Put workflow-specific knobs in protocol files; put subject quirks in patient files.

  2. Don’t hardcode paths in code. Every path comes through the config.

  3. Use the example files as templates — they document the optional fields that aren’t in the bare defaults.

  4. Validate early. Section-level extra="forbid" rejects typos at load time; trust the model and don’t bypass it.

  5. Comment non-obvious values in YAML — Pydantic descriptions show up in thesis show-config, but comments stay visible in the YAML itself.