Path Declarations#
Declarative path types consumed by the @workflow decorator API to describe
a workflow’s required inputs, produced outputs, and optional artefacts. The
adapter synthesised by @workflow resolves each declaration against the
active ProcessingContext at workflow-build time
and injects the result as a keyword argument. @requires declarations also
drive automatic preflight verification.
Choosing a declaration#
Need |
Use |
|---|---|
A per-patient input file under |
|
A per-patient input directory under |
|
An output subdirectory under |
|
A scratch file under |
|
A cohort-level output directory |
|
A handoff file from an upstream workflow |
|
A shared asset under |
|
A config-sourced file that may live anywhere (out-of-tree / absolute) — e.g. a registration template or atlas |
|
A glob of related files in one directory |
|
A group of globs sharing a search directory |
|
A YAML list of dicts → per-item resolved paths |
|
Iterate per-patient subdirs under a cohort root |
|
Examples#
Required T1 with config-driven override and a fallback search path.
PatientFile(
default="T1w/T1w_acpc_dc_restore.nii.gz",
config_paths=["my_workflow.t1_image", "hcp.t1_image"],
fallback_dirs=["input_dir", "output_dir"],
)
Optional segmentation file (skipped check when missing).
PatientFile(default="T1w/aparc+aseg.nii.gz", optional=True)
Output subdirectory (created automatically).
OutputDir("tractography/probtrackx2")
Discover BedpostX merged samples under the patient input tree.
GlobMatch(
pattern="merged_th*samples.nii.gz",
fallback_dirs=["input_dir", "output_dir"],
recursive_fallback=True,
min_matches=3,
)
Group of related BedpostX files sharing one search directory.
GlobGroup(
items={
"thsamples": "merged_th*samples.nii.gz",
"phsamples": "merged_ph*samples.nii.gz",
"fsamples": "merged_f*samples.nii.gz",
},
fallback_dirs=["input_dir", "output_dir"],
)
Iterate cohort patient directories filtering by required output files.
CohortPatients(
subdir="tractography/probtrackx2",
file_patterns={
"fdt_paths": "warped_streamlines/fdt_paths.nii.gz",
"waytotal": "warped_streamlines/waytotal",
},
min_patients=5,
)
Resolve a YAML transforms-job list with mixed file / dir / string fields.
ConfigList(
config_path="transforms.jobs",
file_fields=("input_file", "reference_image"),
dir_fields=("output_dir",),
str_fields=("direction", "interpolation"),
)
Module reference#
Declarative path types for the workflow decorator API.
These types let workflow authors declare which patient inputs, outputs,
and working files their workflow needs without manually unpacking
config and context. The @workflow decorator’s synthesized
adapter resolves each declaration against the current
ProcessingContext at workflow-build time
and injects the resolved pathlib.Path as a keyword argument
to the decorated function.
See markdowns/plans/workflow-decorator/01-design-spec.md and
02-api-reference.md for the full design.
- class thesis.core.path_declarations.PathDeclaration[source]#
Bases:
objectBase marker class for path declarations.
Subclasses implement
resolve()(Path resolution against aProcessingContext) andexistence_errors()(implicit preflight check used by the synthesized verifier).- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- class thesis.core.path_declarations.PatientFile[source]#
Bases:
PathDeclarationDeclare a per-patient input file under
context.input_dir.- Parameters:
default (
Optional[str]) – Filename to use when noconfig_path/config_pathsvalue resolves to non-None. Relative tocontext.input_dir(or the first matchingfallback_dirsentry). May contain a{patient_id}placeholder.config_path (
Optional[str]) – Single dotted attribute path on thePipelineConfigwhose value, if non-None, overrides the default. Kept for backward compatibility with Phase A/A.2/B call sites — preferconfig_pathsfor new code.config_paths (
Union[str,List[str],None]) – Single dotted path or list of dotted paths consulted in priority order. The first non-Nonevalue wins. When bothconfig_pathandconfig_pathsare given,config_pathstakes precedence; if it yields nothing,config_pathis consulted next.fallback_dirs (
Optional[List[str]]) – Optional ordered list of base-directory names to search if the resolved filename does not exist atcontext.input_dir. Valid names:"input_dir","output_dir","working_dir","data_dir",".". When set, the names listed are the complete search order — include"input_dir"explicitly if you want it searched.optional (
bool) – WhenTrue, missing paths resolve toNoneand the implicit existence check is skipped._kind (
str)
- Raises:
ConfigurationError – When
default,config_path,config_pathsare all unset andoptional=False— there is no way to derive a path. Also raised whenfallback_dirscontains an unknown directory name.
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- __init__(default=None, config_path=None, config_paths=None, fallback_dirs=None, optional=False, _kind='patient')#
- class thesis.core.path_declarations.PatientDir[source]#
Bases:
PathDeclarationDeclare a per-patient input directory under
context.input_dir.Mirrors
PatientFile(includingconfig_pathsandfallback_dirs) but enforces that the resolved path is a directory. Supports{patient_id}template substitution.- Parameters:
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- __init__(default=None, config_path=None, config_paths=None, fallback_dirs=None, optional=False, _kind='patient')#
- class thesis.core.path_declarations.OutputDir[source]#
Bases:
PathDeclarationDeclare an output subdirectory under
context.output_dir.Resolution creates the directory (
mkdir(parents=True, exist_ok=True)). No implicit existence check is generated.- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- class thesis.core.path_declarations.WorkingFile[source]#
Bases:
PathDeclarationDeclare a temporary file under
context.working_dir.Valid in both
scope="patient"andscope="cohort". No implicit existence check is generated.- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- class thesis.core.path_declarations.CohortDir[source]#
Bases:
PathDeclarationDeclare a cohort-level output directory.
Resolves to
context.output_dir / subdir(the cohort dispatch path incli.pysetsoutput_dirto the cohort root for cohort-scope workflows). The directory is created on resolution.- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- class thesis.core.path_declarations.PriorOutput[source]#
Bases:
PathDeclarationDeclare an existing file (or set of files) inside
context.output_dirproduced by an upstream workflow.Used by downstream workflows (qc, tract_similarity) that consume the HCP / preprocess / atlas results stored under the patient’s output tree.
Two resolution modes:
Single file — set filename. Resolution returns a single
pathlib.Path(context.output_dir / subdir / filename). Supports{patient_id}template substitution.Glob discovery — set glob_pattern. Resolution returns a sorted
list[Path]of matches undercontext.output_dir / subdir(or recursively if recursive isTrue). Returns[]when no match andoptional=True.
- Parameters:
filename (
Optional[str]) – Single-file mode filename, relative tocontext.output_diror subdir.glob_pattern (
Optional[str]) – Glob expression for the multi-file mode. Mutually exclusive with filename (at least one must be set).subdir (
Optional[str]) – Optional subdirectory undercontext.output_dir.fallback_dirs (
Optional[List[str]]) – Optional ordered list of base-directory names to search if the primary location (output_dir) does not satisfy the lookup. Valid names: seePatientFile.recursive (
bool) – WhenTrueand glob_pattern is set, usePath.rglob()instead ofPath.glob().optional (
bool) – Skip the implicit existence check._kind (
str)
- Raises:
ConfigurationError – When neither filename nor glob_pattern is set, or when fallback_dirs contains an unknown name.
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- __init__(filename=None, glob_pattern=None, subdir=None, fallback_dirs=None, recursive=False, optional=False, _kind='any')#
- class thesis.core.path_declarations.DataFile[source]#
Bases:
PathDeclarationDeclare an input file under
context.data_dir.Used for cohort-shared assets (templates, atlases, reference images) that live in the project’s data directory rather than the per-patient input tree. Supports
{patient_id}substitution.Implicit existence check: the resolved path must exist and be a file (unless
optional=True). Path-traversal is enforced — the resolved path must remain undercontext.data_dir.- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- class thesis.core.path_declarations.DataDir[source]#
Bases:
PathDeclarationDeclare an input directory under
context.data_dir.Mirrors
DataFilebut enforces that the resolved path is a directory. Supports{patient_id}substitution and path-traversal safety.- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- class thesis.core.path_declarations.ExternalFile[source]#
Bases:
PathDeclarationDeclare an input file whose path comes from a config value and may live anywhere on disk (absolute,
~,$ENV, or relative to a base dir).Unlike
PatientFile/DataFile, this is not anchored to the per-patientinput_dir(ordata_dir) and applies no path-traversal guard. It is for cohort-shared, out-of-tree assets whose location is supplied by configuration — e.g. a registration template, a transform warp, or an atlas reference image — which are routinely absolute paths outside the patient tree (wherePatientFilewould raise).Resolution: the first non-empty value from
config_paths/config_pathis read;{patient_id}is substituted;~and$ENVare expanded; absolute paths are used as-is; relative paths resolve against the first availablebase_dirsentry (defaultdata_dirthen cwd). Implicit existence check (must be a regular file) unlessoptional.- Parameters:
config_path (
Optional[str]) – Single dotted config path (e.g."registration.fixed_image").config_paths (
Union[str,List[str],None]) – Single path or ordered list consulted beforeconfig_path.base_dirs (
Optional[List[str]]) – Ordered base-dir names for resolving relative values. Valid names:"input_dir","output_dir","working_dir","data_dir",".". Defaults to["data_dir", "."].optional (
bool) – WhenTrue, an unset value resolves toNoneand the implicit existence check is skipped._kind (
str)
- Raises:
ConfigurationError – When neither
config_pathnorconfig_pathsis set (and notoptional), orbase_dirshas an unknown name.
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- class thesis.core.path_declarations.GlobMatch[source]#
Bases:
PathDeclarationDiscover files matching a glob pattern, with directory-fallback search.
Resolution returns a sorted
list[Path]. The implicit existence check enforceslen(matches) >= min_matchesunlessoptionalis set.- Parameters:
pattern (
str) – Glob expression (e.g."merged_th*samples.nii.gz").{patient_id}is substituted.primary_dir (
Optional[PathDeclaration]) – First directory to search. WhenNone, the search starts fromcontext.input_dir.fallback_dirs (
Optional[List[str]]) – Ordered list of base-directory names tried when the primary yields no matches. Valid names:"input_dir","output_dir","working_dir","data_dir",".".recursive_fallback (
bool) – WhenTrue, each candidate base is also searched viaPath.rglob()before moving to the next entry.min_matches (
int) – Minimum match count for the implicit existence check (default1).optional (
bool) – Skip the implicit existence check; resolution may return[]._kind (
str)
- Raises:
ConfigurationError – When pattern is empty or fallback_dirs contains an unknown name.
- primary_dir: PathDeclaration | None = None#
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- __init__(pattern='', primary_dir=None, fallback_dirs=None, recursive_fallback=False, min_matches=1, optional=False, _kind='any')#
- class thesis.core.path_declarations.GlobGroup[source]#
Bases:
PathDeclarationResolve a group of related globs sharing the same search directory.
The key insight: all items in the group are resolved against the same base directory. If the first item has no matches in the primary directory, the entire group falls through to the next candidate. This mirrors the directory-discovery semantics of
prepare_hcp_paths.- Parameters:
items (
dict) – Mapping of result-attribute name to glob pattern.{patient_id}is substituted in each pattern.primary_dir (
Optional[PathDeclaration]) – First directory to search (default:context.input_dir).fallback_dirs (
Optional[List[str]]) – Ordered fallback names (seeGlobMatch).recursive_fallback (
bool) – WhenTrue,rglobis tried before advancing to the next candidate base.optional (
bool) – WhenTrue, skip the implicit existence check._kind (
str)
- Returns:
A
GlobGroupResultwith one attribute peritemskey (each alist[Path]) plus_found_dir.- Raises:
ConfigurationError – When items is empty or fallback_dirs contains an unknown name.
- primary_dir: PathDeclaration | None = None#
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- class thesis.core.path_declarations.GlobGroupResult[source]#
Bases:
objectResult object returned by
GlobGroup.resolve().Exposes one attribute per declared item (
list[Path]) plus_found_dir(Path | None). Iteration is not supported; access items by name.
- class thesis.core.path_declarations.ConfigList[source]#
Bases:
PathDeclarationIterate a structured YAML list and resolve per-item paths.
Useful for jobs / batch specifications where the YAML carries a list of dicts and each dict contains filenames the workflow needs as
pathlib.Pathobjects.- Parameters:
config_path (
str) – Dotted path on thePipelineConfigto the list.file_fields (
Tuple[str,...]) – YAML keys whose values should be resolved as files (relative paths anchored undercontext.input_diror one of fallback_dirs; absolute paths are kept as-is). List values becomelist[Path].dir_fields (
Tuple[str,...]) – YAML keys whose values should be resolved as directories. Same anchoring rules as file_fields.str_fields (
Tuple[str,...]) – YAML keys kept as plainstr(no resolution).fallback_dirs (
Optional[List[str]]) – Same semantics asPatientFile. Defaults to["input_dir"]if unset.optional (
bool) – WhenTrue, missing config path or empty list is permitted; otherwise the implicit existence check fails._kind (
str)
- Returns:
list[ConfigListItem]— one per YAML list entry.- Raises:
ConfigurationError – When config_path is empty or fallback_dirs contains an unknown name.
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- __init__(config_path='', file_fields=(), dir_fields=(), str_fields=(), fallback_dirs=None, optional=False, _kind='any')#
- class thesis.core.path_declarations.ConfigListItem[source]#
Bases:
objectSingle resolved item from a
ConfigList.Attributes are populated dynamically from the parent declaration’s
file_fields/dir_fields/str_fields. Values arepathlib.Path(orlist[Path]) for file / dir fields andstrfor string fields. Missing keys resolve toNone.- Parameters:
fields (
dict)
- class thesis.core.path_declarations.CohortPatients[source]#
Bases:
PathDeclarationIterate per-patient subdirectories under a cohort root.
- Parameters:
root_dir (
Optional[PathDeclaration]) – Cohort root (default:context.output_dir).subdir (
Optional[str]) – Optional per-patient subdirectory that must exist for the patient to be included.exclude (
Tuple[str,...]) – Patient-id names to skip (case-sensitive).min_patients (
int) – Minimum patient count for the implicit existence check.file_patterns (
Optional[dict]) – Optionalkey -> globmap. Patients missing any pattern are excluded. Each pattern resolves to the first (sorted) match under the patient’s directory and is exposed aspatient.<key>.optional (
bool) – Skip the implicit existence check._kind (
str)
- root_dir: PathDeclaration | None = None#
- resolve(config, context)[source]#
Resolve the declaration to a concrete value for the workflow body.
The return type varies by subclass: single-Path primitives return
pathlib.Path; glob primitives returnlist[Path]or a namespace object; structured primitives return a list of dataclass instances.- Parameters:
config (
PipelineConfig)context (
ProcessingContext)
- Return type:
- existence_errors(config, context, name)[source]#
Return any preflight error strings (empty when nothing to check).
- Parameters:
config (
PipelineConfig)context (
ProcessingContext)name (
str)
- Return type:
- __init__(root_dir=None, subdir=None, exclude=('cohort', 'atlas', '_meta'), min_patients=1, file_patterns=None, optional=False, _kind='cohort')#
- class thesis.core.path_declarations.CohortPatient[source]#
Bases:
objectSingle per-patient entry returned by
CohortPatients.- Variables:
patient_id – Subdirectory name treated as the patient identifier.
patient_dir – Absolute path to the patient’s root directory.
Additional attributes named after the keys of
CohortPatients.file_patternscarry the first matchingpathlib.Pathfor that pattern.- Parameters:
fields (
dict)