Path Declarations#

Declarative path types consumed by the @workflow decorator API to describe a workflow’s required inputs, produced outputs, and optional artefacts. The adapter synthesised by @workflow resolves each declaration against the active ProcessingContext at workflow-build time and injects the result as a keyword argument. @requires declarations also drive automatic preflight verification.

Choosing a declaration#

Need

Use

A per-patient input file under input_dir

PatientFile

A per-patient input directory under input_dir

PatientDir

An output subdirectory under output_dir

OutputDir

A scratch file under working_dir

WorkingFile

A cohort-level output directory

CohortDir

A handoff file from an upstream workflow

PriorOutput

A shared asset under data_dir

DataFile / DataDir

A config-sourced file that may live anywhere (out-of-tree / absolute) — e.g. a registration template or atlas

ExternalFile

A glob of related files in one directory

GlobMatch

A group of globs sharing a search directory

GlobGroup

A YAML list of dicts → per-item resolved paths

ConfigList

Iterate per-patient subdirs under a cohort root

CohortPatients

Examples#

Required T1 with config-driven override and a fallback search path.

PatientFile(
    default="T1w/T1w_acpc_dc_restore.nii.gz",
    config_paths=["my_workflow.t1_image", "hcp.t1_image"],
    fallback_dirs=["input_dir", "output_dir"],
)

Optional segmentation file (skipped check when missing).

PatientFile(default="T1w/aparc+aseg.nii.gz", optional=True)

Output subdirectory (created automatically).

OutputDir("tractography/probtrackx2")

Discover BedpostX merged samples under the patient input tree.

GlobMatch(
    pattern="merged_th*samples.nii.gz",
    fallback_dirs=["input_dir", "output_dir"],
    recursive_fallback=True,
    min_matches=3,
)

Group of related BedpostX files sharing one search directory.

GlobGroup(
    items={
        "thsamples": "merged_th*samples.nii.gz",
        "phsamples": "merged_ph*samples.nii.gz",
        "fsamples":  "merged_f*samples.nii.gz",
    },
    fallback_dirs=["input_dir", "output_dir"],
)

Iterate cohort patient directories filtering by required output files.

CohortPatients(
    subdir="tractography/probtrackx2",
    file_patterns={
        "fdt_paths": "warped_streamlines/fdt_paths.nii.gz",
        "waytotal":  "warped_streamlines/waytotal",
    },
    min_patients=5,
)

Resolve a YAML transforms-job list with mixed file / dir / string fields.

ConfigList(
    config_path="transforms.jobs",
    file_fields=("input_file", "reference_image"),
    dir_fields=("output_dir",),
    str_fields=("direction", "interpolation"),
)

Module reference#

Declarative path types for the workflow decorator API.

These types let workflow authors declare which patient inputs, outputs, and working files their workflow needs without manually unpacking config and context. The @workflow decorator’s synthesized adapter resolves each declaration against the current ProcessingContext at workflow-build time and injects the resolved pathlib.Path as a keyword argument to the decorated function.

See markdowns/plans/workflow-decorator/01-design-spec.md and 02-api-reference.md for the full design.

class thesis.core.path_declarations.PathDeclaration[source]#

Bases: object

Base marker class for path declarations.

Subclasses implement resolve() (Path resolution against a ProcessingContext) and existence_errors() (implicit preflight check used by the synthesized verifier).

resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Any

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

class thesis.core.path_declarations.PatientFile[source]#

Bases: PathDeclaration

Declare a per-patient input file under context.input_dir.

Parameters:
  • default (Optional[str]) – Filename to use when no config_path/config_paths value resolves to non-None. Relative to context.input_dir (or the first matching fallback_dirs entry). May contain a {patient_id} placeholder.

  • config_path (Optional[str]) – Single dotted attribute path on the PipelineConfig whose value, if non-None, overrides the default. Kept for backward compatibility with Phase A/A.2/B call sites — prefer config_paths for new code.

  • config_paths (Union[str, List[str], None]) – Single dotted path or list of dotted paths consulted in priority order. The first non-None value wins. When both config_path and config_paths are given, config_paths takes precedence; if it yields nothing, config_path is consulted next.

  • fallback_dirs (Optional[List[str]]) – Optional ordered list of base-directory names to search if the resolved filename does not exist at context.input_dir. Valid names: "input_dir", "output_dir", "working_dir", "data_dir", ".". When set, the names listed are the complete search order — include "input_dir" explicitly if you want it searched.

  • optional (bool) – When True, missing paths resolve to None and the implicit existence check is skipped.

  • _kind (str)

Raises:

ConfigurationError – When default, config_path, config_paths are all unset and optional=False — there is no way to derive a path. Also raised when fallback_dirs contains an unknown directory name.

default: str | None = None#
config_path: str | None = None#
config_paths: str | List[str] | None = None#
fallback_dirs: List[str] | None = None#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Optional[Path]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(default=None, config_path=None, config_paths=None, fallback_dirs=None, optional=False, _kind='patient')#
Parameters:
Return type:

None

class thesis.core.path_declarations.PatientDir[source]#

Bases: PathDeclaration

Declare a per-patient input directory under context.input_dir.

Mirrors PatientFile (including config_paths and fallback_dirs) but enforces that the resolved path is a directory. Supports {patient_id} template substitution.

Parameters:
default: str | None = None#
config_path: str | None = None#
config_paths: str | List[str] | None = None#
fallback_dirs: List[str] | None = None#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Optional[Path]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(default=None, config_path=None, config_paths=None, fallback_dirs=None, optional=False, _kind='patient')#
Parameters:
Return type:

None

class thesis.core.path_declarations.OutputDir[source]#

Bases: PathDeclaration

Declare an output subdirectory under context.output_dir.

Resolution creates the directory (mkdir(parents=True, exist_ok=True)). No implicit existence check is generated.

Parameters:
subdir: str = ''#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Path

__init__(subdir='', _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.WorkingFile[source]#

Bases: PathDeclaration

Declare a temporary file under context.working_dir.

Valid in both scope="patient" and scope="cohort". No implicit existence check is generated.

Parameters:
name: str = ''#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Path

__init__(name='', _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.CohortDir[source]#

Bases: PathDeclaration

Declare a cohort-level output directory.

Resolves to context.output_dir / subdir (the cohort dispatch path in cli.py sets output_dir to the cohort root for cohort-scope workflows). The directory is created on resolution.

Parameters:
subdir: str = ''#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Path

__init__(subdir='', _kind='cohort')#
Parameters:
Return type:

None

class thesis.core.path_declarations.PriorOutput[source]#

Bases: PathDeclaration

Declare an existing file (or set of files) inside context.output_dir produced by an upstream workflow.

Used by downstream workflows (qc, tract_similarity) that consume the HCP / preprocess / atlas results stored under the patient’s output tree.

Two resolution modes:

  • Single file — set filename. Resolution returns a single pathlib.Path (context.output_dir / subdir / filename). Supports {patient_id} template substitution.

  • Glob discovery — set glob_pattern. Resolution returns a sorted list[Path] of matches under context.output_dir / subdir (or recursively if recursive is True). Returns [] when no match and optional=True.

Parameters:
  • filename (Optional[str]) – Single-file mode filename, relative to context.output_dir or subdir.

  • glob_pattern (Optional[str]) – Glob expression for the multi-file mode. Mutually exclusive with filename (at least one must be set).

  • subdir (Optional[str]) – Optional subdirectory under context.output_dir.

  • fallback_dirs (Optional[List[str]]) – Optional ordered list of base-directory names to search if the primary location (output_dir) does not satisfy the lookup. Valid names: see PatientFile.

  • recursive (bool) – When True and glob_pattern is set, use Path.rglob() instead of Path.glob().

  • optional (bool) – Skip the implicit existence check.

  • _kind (str)

Raises:

ConfigurationError – When neither filename nor glob_pattern is set, or when fallback_dirs contains an unknown name.

filename: str | None = None#
glob_pattern: str | None = None#
subdir: str | None = None#
fallback_dirs: List[str] | None = None#
recursive: bool = False#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Union[Path, List[Path]]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(filename=None, glob_pattern=None, subdir=None, fallback_dirs=None, recursive=False, optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.DataFile[source]#

Bases: PathDeclaration

Declare an input file under context.data_dir.

Used for cohort-shared assets (templates, atlases, reference images) that live in the project’s data directory rather than the per-patient input tree. Supports {patient_id} substitution.

Implicit existence check: the resolved path must exist and be a file (unless optional=True). Path-traversal is enforced — the resolved path must remain under context.data_dir.

Parameters:
filename: str = ''#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Path

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(filename='', optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.DataDir[source]#

Bases: PathDeclaration

Declare an input directory under context.data_dir.

Mirrors DataFile but enforces that the resolved path is a directory. Supports {patient_id} substitution and path-traversal safety.

Parameters:
dirname: str = ''#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Path

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(dirname='', optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.ExternalFile[source]#

Bases: PathDeclaration

Declare an input file whose path comes from a config value and may live anywhere on disk (absolute, ~, $ENV, or relative to a base dir).

Unlike PatientFile / DataFile, this is not anchored to the per-patient input_dir (or data_dir) and applies no path-traversal guard. It is for cohort-shared, out-of-tree assets whose location is supplied by configuration — e.g. a registration template, a transform warp, or an atlas reference image — which are routinely absolute paths outside the patient tree (where PatientFile would raise).

Resolution: the first non-empty value from config_paths / config_path is read; {patient_id} is substituted; ~ and $ENV are expanded; absolute paths are used as-is; relative paths resolve against the first available base_dirs entry (default data_dir then cwd). Implicit existence check (must be a regular file) unless optional.

Parameters:
  • config_path (Optional[str]) – Single dotted config path (e.g. "registration.fixed_image").

  • config_paths (Union[str, List[str], None]) – Single path or ordered list consulted before config_path.

  • base_dirs (Optional[List[str]]) – Ordered base-dir names for resolving relative values. Valid names: "input_dir", "output_dir", "working_dir", "data_dir", ".". Defaults to ["data_dir", "."].

  • optional (bool) – When True, an unset value resolves to None and the implicit existence check is skipped.

  • _kind (str)

Raises:

ConfigurationError – When neither config_path nor config_paths is set (and not optional), or base_dirs has an unknown name.

config_path: str | None = None#
config_paths: str | List[str] | None = None#
base_dirs: List[str] | None = None#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

Optional[Path]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(config_path=None, config_paths=None, base_dirs=None, optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.GlobMatch[source]#

Bases: PathDeclaration

Discover files matching a glob pattern, with directory-fallback search.

Resolution returns a sorted list[Path]. The implicit existence check enforces len(matches) >= min_matches unless optional is set.

Parameters:
  • pattern (str) – Glob expression (e.g. "merged_th*samples.nii.gz"). {patient_id} is substituted.

  • primary_dir (Optional[PathDeclaration]) – First directory to search. When None, the search starts from context.input_dir.

  • fallback_dirs (Optional[List[str]]) – Ordered list of base-directory names tried when the primary yields no matches. Valid names: "input_dir", "output_dir", "working_dir", "data_dir", ".".

  • recursive_fallback (bool) – When True, each candidate base is also searched via Path.rglob() before moving to the next entry.

  • min_matches (int) – Minimum match count for the implicit existence check (default 1).

  • optional (bool) – Skip the implicit existence check; resolution may return [].

  • _kind (str)

Raises:

ConfigurationError – When pattern is empty or fallback_dirs contains an unknown name.

pattern: str = ''#
primary_dir: PathDeclaration | None = None#
fallback_dirs: List[str] | None = None#
recursive_fallback: bool = False#
min_matches: int = 1#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

List[Path]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(pattern='', primary_dir=None, fallback_dirs=None, recursive_fallback=False, min_matches=1, optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.GlobGroup[source]#

Bases: PathDeclaration

Resolve a group of related globs sharing the same search directory.

The key insight: all items in the group are resolved against the same base directory. If the first item has no matches in the primary directory, the entire group falls through to the next candidate. This mirrors the directory-discovery semantics of prepare_hcp_paths.

Parameters:
  • items (dict) – Mapping of result-attribute name to glob pattern. {patient_id} is substituted in each pattern.

  • primary_dir (Optional[PathDeclaration]) – First directory to search (default: context.input_dir).

  • fallback_dirs (Optional[List[str]]) – Ordered fallback names (see GlobMatch).

  • recursive_fallback (bool) – When True, rglob is tried before advancing to the next candidate base.

  • optional (bool) – When True, skip the implicit existence check.

  • _kind (str)

Returns:

A GlobGroupResult with one attribute per items key (each a list[Path]) plus _found_dir.

Raises:

ConfigurationError – When items is empty or fallback_dirs contains an unknown name.

items: dict#
primary_dir: PathDeclaration | None = None#
fallback_dirs: List[str] | None = None#
recursive_fallback: bool = False#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

GlobGroupResult

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(items=<factory>, primary_dir=None, fallback_dirs=None, recursive_fallback=False, optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.GlobGroupResult[source]#

Bases: object

Result object returned by GlobGroup.resolve().

Exposes one attribute per declared item (list[Path]) plus _found_dir (Path | None). Iteration is not supported; access items by name.

Parameters:
__init__(items, found_dir)[source]#
Parameters:
Return type:

None

class thesis.core.path_declarations.ConfigList[source]#

Bases: PathDeclaration

Iterate a structured YAML list and resolve per-item paths.

Useful for jobs / batch specifications where the YAML carries a list of dicts and each dict contains filenames the workflow needs as pathlib.Path objects.

Parameters:
  • config_path (str) – Dotted path on the PipelineConfig to the list.

  • file_fields (Tuple[str, ...]) – YAML keys whose values should be resolved as files (relative paths anchored under context.input_dir or one of fallback_dirs; absolute paths are kept as-is). List values become list[Path].

  • dir_fields (Tuple[str, ...]) – YAML keys whose values should be resolved as directories. Same anchoring rules as file_fields.

  • str_fields (Tuple[str, ...]) – YAML keys kept as plain str (no resolution).

  • fallback_dirs (Optional[List[str]]) – Same semantics as PatientFile. Defaults to ["input_dir"] if unset.

  • optional (bool) – When True, missing config path or empty list is permitted; otherwise the implicit existence check fails.

  • _kind (str)

Returns:

list[ConfigListItem] — one per YAML list entry.

Raises:

ConfigurationError – When config_path is empty or fallback_dirs contains an unknown name.

config_path: str = ''#
file_fields: Tuple[str, ...] = ()#
dir_fields: Tuple[str, ...] = ()#
str_fields: Tuple[str, ...] = ()#
fallback_dirs: List[str] | None = None#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

List[ConfigListItem]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(config_path='', file_fields=(), dir_fields=(), str_fields=(), fallback_dirs=None, optional=False, _kind='any')#
Parameters:
Return type:

None

class thesis.core.path_declarations.ConfigListItem[source]#

Bases: object

Single resolved item from a ConfigList.

Attributes are populated dynamically from the parent declaration’s file_fields / dir_fields / str_fields. Values are pathlib.Path (or list[Path]) for file / dir fields and str for string fields. Missing keys resolve to None.

Parameters:

fields (dict)

__init__(fields)[source]#
Parameters:

fields (dict)

Return type:

None

class thesis.core.path_declarations.CohortPatients[source]#

Bases: PathDeclaration

Iterate per-patient subdirectories under a cohort root.

Parameters:
  • root_dir (Optional[PathDeclaration]) – Cohort root (default: context.output_dir).

  • subdir (Optional[str]) – Optional per-patient subdirectory that must exist for the patient to be included.

  • exclude (Tuple[str, ...]) – Patient-id names to skip (case-sensitive).

  • min_patients (int) – Minimum patient count for the implicit existence check.

  • file_patterns (Optional[dict]) – Optional key -> glob map. Patients missing any pattern are excluded. Each pattern resolves to the first (sorted) match under the patient’s directory and is exposed as patient.<key>.

  • optional (bool) – Skip the implicit existence check.

  • _kind (str)

root_dir: PathDeclaration | None = None#
subdir: str | None = None#
exclude: Tuple[str, ...] = ('cohort', 'atlas', '_meta')#
min_patients: int = 1#
file_patterns: dict | None = None#
optional: bool = False#
resolve(config, context)[source]#

Resolve the declaration to a concrete value for the workflow body.

The return type varies by subclass: single-Path primitives return pathlib.Path; glob primitives return list[Path] or a namespace object; structured primitives return a list of dataclass instances.

Parameters:
Return type:

List[CohortPatient]

existence_errors(config, context, name)[source]#

Return any preflight error strings (empty when nothing to check).

Parameters:
Return type:

list[str]

__init__(root_dir=None, subdir=None, exclude=('cohort', 'atlas', '_meta'), min_patients=1, file_patterns=None, optional=False, _kind='cohort')#
Parameters:
Return type:

None

class thesis.core.path_declarations.CohortPatient[source]#

Bases: object

Single per-patient entry returned by CohortPatients.

Variables:
  • patient_id – Subdirectory name treated as the patient identifier.

  • patient_dir – Absolute path to the patient’s root directory.

Additional attributes named after the keys of CohortPatients.file_patterns carry the first matching pathlib.Path for that pattern.

Parameters:

fields (dict)

__init__(fields)[source]#
Parameters:

fields (dict)

Return type:

None