`tract_similarity` — per-patient/cohort tract similarity#

Schema: TractSimilarityConfig in src/thesis/core/config/validators.py. Drives the tract_similarity (per-patient) and tract_similarity_cohort (cohort) workflows. Both compare native-space tractography density against a warped cohort mean atlas.

Top-level fields#

Field	Type	Default	Constraints	Description
`probtrackx_relpath`	`str`	`"tractography/probtrackx2"`	—	Relative path to the tractography output directory. Auto-detects single-run vs hemisphere-split (`left/` + `right/`) layout and sums the waytotal-normalised volumes across hemispheres. Override to `tractography/mrtrix3` for the MRtrix3 backend (the filename `fdt_paths.nii.gz` matches both).
`fdt_name`	`str`	`"fdt_paths.nii.gz"`	—	Filename of the density volume within each run directory.
`waytotal_name`	`str`	`"waytotal"`	—	Filename of the waytotal text file within each run directory. MRtrix3 writes this as `mu × Σ SIFT2 weights` or the raw streamline count.
`atlas_relpath`	`str`	`"atlas_in_patient_space/atlas_mean*.nii.gz"`	—	Relative path or glob pattern for the warped cohort mean atlas in patient space (typically produced by `atlas_to_patient`).
`subject_threshold`	`SideThresholdConfig`	`mode=fraction, value=0.05`	—	Binarisation threshold for the subject’s tractography volume. See below.
`atlas_threshold`	`SideThresholdConfig`	`mode=fraction, value=0.05`	—	Binarisation threshold for the warped atlas volume.
`n_bins`	`int`	`64`	`8 ≤ n ≤ 1024`	Histogram bin count for normalised mutual information.
`output_subdir`	`str`	`"tract_similarity"`	—	Patient-level output subdirectory for `metrics.json`.
`cohort_output_subdir`	`str`	`"cohort/tract_similarity"`	—	Cohort-level output subdirectory for aggregated metrics.
`hcp_loo`	`HcpLooConfig`	`minimum_subjects=3, write_volumes=True`	—	Per-HCP-subject leave-one-out comparison against the cohort atlas. See below.

`subject_threshold` / `atlas_threshold` — `SideThresholdConfig`#

Field	Type	Default	Constraints	Description
`mode`	`"fraction" \| "absolute"`	`"fraction"`	—	`fraction` applies `value × max(volume)` as the cutoff. `absolute` uses `value` directly as a raw voxel-intensity cutoff.
`value`	`float`	`0.05`	`> 0`; in `(0, 1)` when `mode="fraction"`	Threshold value.

`hcp_loo` — `HcpLooConfig`#

Drives the cohort-scope tract_similarity_hcp_loo workflow, which emits per-HCP-subject metrics (full four-family suite) against an in-memory leave-one-out cohort mean atlas. The artefacts mirror the per-patient layout, so the cohort aggregator (tract_similarity_cohort) and downstream notebooks pick HCP rows up alongside patient rows automatically.

Field	Type	Default	Constraints	Description
`minimum_subjects`	`int`	`3`	`>= 2`	Minimum cohort size. LOO is mathematically valid at 2; the floor of 3 enforces a meaningful cohort reference.
`write_volumes`	`bool`	`true`	—	If true, write `subject_normalized.nii.gz`, `atlas_normalized.nii.gz`, `subject_mask.nii.gz`, `atlas_mask.nii.gz` per subject alongside `metrics.json`. Set false to skip them on large cohorts.

Invoke with thesis run -w tract_similarity_hcp_loo -c <profile>. Use this when you want unbiased HCP-vs-atlas dice (and the other three metric families) directly comparable to the per-patient metrics produced by tract_similarity.

Metric families#

The per-patient metrics.json contains four metric families:

Family	Metrics
`overlap`	Dice, Jaccard, subject_voxels, atlas_voxels, intersection, union
`correlation`	Pearson, Spearman, cosine
`distance_mm`	Hausdorff-95, mean-surface, centroid_distance
`distribution`	NMI, symmetric KL, Bhattacharyya

The cohort workflow emits summary.csv, per_patient.csv, and outliers.json.

Example#

tract_similarity:
  probtrackx_relpath: tractography/probtrackx2
  fdt_name: fdt_paths.nii.gz
  waytotal_name: waytotal
  atlas_relpath: "atlas_in_patient_space/atlas_mean*.nii.gz"

  subject_threshold:
    mode: fraction
    value: 0.05
  atlas_threshold:
    mode: fraction
    value: 0.05

  n_bins: 64
  output_subdir: tract_similarity
  cohort_output_subdir: cohort/tract_similarity

Notes#

The workflow is registered twice — tract_similarity for per-patient runs and tract_similarity_cohort (cohort-level, scope="cohort") for aggregation. Both share this config block.
For grid-searching the thresholds, see tract_similarity_sweep.
full_pipeline auto-runs tract_similarity as its final stage; the CLI then surfaces headline Dice/Pearson/Hausdorff95/NMI in the run summary.