Configuration Guide#
The thesis framework is driven entirely by YAML configuration validated with Pydantic v2. Each section model sets extra="forbid", so an unknown key inside a known section raises a ConfigurationError. The root PipelineConfig uses extra="allow" so workflow-registered namespaces are recognised, but any unknown top-level key that isn’t a registered namespace is still rejected. This page covers the merge order, path expansion rules, code-level access, and links to a complete per-section reference of every key.
The authoritative schema lives in src/thesis/core/config/validators.py. If a key isn’t documented in the pages below, look there — the docs aim to mirror the Pydantic models exactly.
Merge order#
Configuration values come from five sources, deep-merged in order (later sources override earlier ones, nested keys preserved):
config/<config_name>.yaml— the base config selected by-c/--config(default:default). Common bases:default,hcp,mrtrix3,tract_synthseg,preprocess. This file is the entire base layer: passing-c mrtrix3loadsconfig/mrtrix3.yamlas the base and replacesdefault.yamlrather than merging on top of it. These are the flat, root-levelconfig/*.yamlfiles.config/hardware.yaml— machine-specific resource overrides (threads, memory, GPU). Gitignored — copy fromhardware.example.yaml.config/protocols/{protocol}.yaml— workflow/protocol-specific parameters (e.g.hcp.yaml,mrtrix3.yaml,full_pipeline.yaml). Selected via--protocolor the workflow’sWorkflowEntry.default_protocol. This is a separate set of files from the Level-1 bases:config/hcp.yaml(a root-level-cbase) andconfig/protocols/hcp.yaml(a--protocoloverlay) are two different files that both exist in the repo. A flat base typically pulls in its matching overlay by setting aprotocol:key.config/patients/{patient_id}.yaml— per-subject overrides (also a place to pinpatient_idandprotocol).CLI flags — runtime overrides (
--hemisphere,--raw-data-dir, etc.; see the CLI overrides table).
The merging is performed by ConfigManager.load_config() (see src/thesis/core/config/manager.py). Invalid merged configs raise ConfigurationError with the underlying Pydantic validation error attached.
Path syntax#
Every path field in PathConfig, NipypeConfig, and SynthSegConfig is processed by thesis.core.utils.to_path() (see src/thesis/core/utils/utils.py). The function applies, in order:
~expansion viaos.path.expanduser(e.g.~/thesis_data→/home/$USER/thesis_data).$VARand${VAR}expansion viaos.path.expandvars(e.g.$DATA_ROOT/processed).Conversion to
pathlib.Path. Relative paths stay relative to the current working directory unless resolved byresolve_path()against a base.
Example structure#
The root-level config/*.yaml files are the Level-1 bases reached with -c/--config;
config/protocols/*.yaml are the Level-3 overlays reached with --protocol (or a
workflow’s default_protocol). The same name can appear in both places (e.g.
config/hcp.yaml and config/protocols/hcp.yaml) — they are distinct files at
different merge levels. Run thesis list-configs for the live list; the tree below is
a snapshot for orientation.
config/
├── default.yaml # Level 1 base (-c default) — global defaults
├── hardware.yaml # Level 2 (gitignored; copy from hardware.example.yaml)
├── cloud.yaml # Level 1 base — cloud/Docker preset
├── hcp.yaml # Level 1 base (-c hcp)
├── mrtrix3.yaml # Level 1 base (-c mrtrix3)
├── mrtrix3_atlas.yaml # Level 1 base (-c mrtrix3_atlas)
├── preprocess.yaml # Level 1 base (-c preprocess)
├── tract_similarity.yaml # Level 1 base (-c tract_similarity)
├── tract_similarity_sweep.yaml # Level 1 base (-c tract_similarity_sweep)
├── tract_synthseg.yaml # Level 1 base (-c tract_synthseg)
├── tract_synthseg_mrtrix3.yaml # Level 1 base (-c tract_synthseg_mrtrix3)
├── *.example.yaml # committed templates (default/hardware/cloud/mrtrix3_atlas/...)
├── protocols/ # Level 3 overlays (--protocol / default_protocol)
│ ├── hcp.yaml
│ ├── mrtrix3.yaml
│ ├── tract_synthseg.yaml
│ ├── tract_synthseg_mrtrix3.yaml
│ ├── full_pipeline.yaml
│ ├── full_pipeline_mrtrix3.yaml # sets tractography.method: mrtrix3 (-w full_pipeline)
│ ├── full_pipeline_probtrackx2_after_mrtrix3.yaml
│ ├── _full_pipeline_short.yaml
│ ├── hcp.example.yaml / standard.example.yaml # committed templates
└── patients/ # Level 4 per-subject overrides
├── patient.example.yaml
├── 114823.yaml
├── 124220.yaml
├── K2_Clinical.yaml
├── K2_HARDI.yaml
└── LDP001.yaml
Loading configuration from Python#
from thesis.core.config import ConfigManager
mgr = ConfigManager(config_dir="config")
# Level 1 only
cfg = mgr.load_config(config_name="default")
# Levels 1+3+4 (+5 via overrides)
cfg = mgr.load_config(
config_name="default",
patient_id="114823",
protocol="hcp",
overrides={"tractography": {"n_samples": 10000}},
)
print(cfg.hardware.threads) # int
print(cfg.tractography.method) # 'probtrackx2' | 'mrtrix3' | ...
The CLI does this for every thesis run invocation, plus it adds CLI-derived overrides for --hemisphere, --raw-data-dir, --output-dir, and runtime GPU validation.
Inspecting the merged config#
thesis show-config default # print the merged level-1 config as YAML
thesis list-configs # list every YAML file ConfigManager can see
thesis list-configs --subdir patients
Patient-specific overrides#
Create one file per subject in config/patients/, named with the patient ID:
# config/patients/114823.yaml
patient_id: "114823"
protocol: hcp # Optional — overrides WorkflowEntry.default_protocol
tractography:
n_samples: 10000 # Only override what differs from the protocol
mem_gb_gpu: 12.0
hcp:
t1_image: "T1w/T1w_acpc_dc_restore_1.25.nii.gz"
CLI overrides#
These CLI flags map directly to config keys (CLI wins):
CLI flag |
Config key |
Notes |
|---|---|---|
|
(selects the Level-1 base file) |
Names the root-level |
|
(selects the Level-3 overlay) |
Names the |
|
|
|
|
|
|
|
|
Implies |
|
|
Also bumps log level to DEBUG |
|
|
|
|
|
|
|
|
|
|
(logger only — not a config field) |
DEBUG/INFO/WARNING/ERROR |
Which sections does my workflow use?#
Each workflow reads only a subset of the sections below. Use this as a jump-off
point to the pages that matter for your run (every workflow also reads paths,
hardware, nipype, and output, which are omitted here for brevity):
Workflow |
Scope |
Config sections consumed |
|---|---|---|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
|
|
patient |
(common sections only) |
|
cohort |
|
|
cohort |
|
|
cohort |
|
|
cohort |
|
|
cohort |
|
All 17 registered workflows are listed. The five cohort-scope workflows ignore -p/--patient-id and --all; they operate on the configured output directory.
Per-section reference#
Every top-level section of PipelineConfig has a dedicated reference page. Each lists every field with name, type, default, validator constraints, and a one-line description.
paths— filesystem layouthardware— compute resourcess3— HCP S3 data downloadpreprocessingandpreprocess— preprocessing knobsregistration— image registrationsegmentation— generic segmentation togglessynthseg— standalone SynthSeg executiontractography— tractography backends and parametershcp— HCP-preprocessed inputstransforms— pre-computed ANTs transformsvalidation— warped-ROI validationqc— QC visualisation outputsatlas— cohort atlas generationlearned_atlas— learned deformable tract-density templateatlas_qc— cohort atlas QCtract_similarity— per-patient/cohort tract similaritytract_similarity_sweep— cohort threshold grid searchnipype— Nipype execution settingsoutput— CLI output behaviour
Best practices#
Keep
default.yamlminimal. Put workflow-specific knobs in protocol files; put subject quirks in patient files.Don’t hardcode paths in code. Every path comes through the config.
Use the example files as templates — they document the optional fields that aren’t in the bare defaults.
Validate early. Section-level
extra="forbid"rejects typos at load time; trust the model and don’t bypass it.Comment non-obvious values in YAML — Pydantic descriptions show up in
thesis show-config, but comments stay visible in the YAML itself.