s3 — HCP S3 data download#
Schema: S3Config in src/thesis/core/config/validators.py.
Optional top-level block (s3: None by default) used by HCP-open-access bucket downloads. Lives under config/cloud.yaml in the default setup.
Field |
Type |
Default |
Constraints |
Description |
|---|---|---|---|---|
|
|
|
— |
Enable automatic S3 data download for HCP workflows. |
|
|
|
— |
S3 bucket name. |
|
|
|
— |
AWS region where the bucket lives. |
|
|
|
— |
Bucket key prefix (folder path within the bucket). |
|
|
|
one of |
Cache behaviour when files already exist locally. |
|
|
|
|
Maximum retry attempts for failed downloads. |
|
|
|
|
Exponential backoff multiplier between retries. |
|
|
see below |
— |
Glob patterns that must be downloaded successfully — failure aborts the run. |
|
|
see below |
— |
Glob patterns to download if available — missing files are tolerated. |
Default patterns#
required_patterns:
- T1w/Diffusion/data.nii*
- T1w/Diffusion/bvals
- T1w/Diffusion/bvecs
- T1w/Diffusion/nodif_brain_mask.nii*
- T1w/Diffusion.bedpostX/merged_*samples.nii.gz
- T1w/T1w_acpc_dc_restore_1.25.nii.gz
optional_patterns:
- T1w/T1w_acpc_dc_restore_brain.nii.gz
- T1w/brainmask_fs.nii.gz
Example#
s3:
enabled: true
bucket: hcp-openaccess
region: us-east-1
prefix: HCP_1200
cache_policy: skip_if_exists
max_retries: 5
retry_backoff: 2.5
Notes#
cache_policy: alwaysre-downloads every file — useful for forcing a refresh after a bucket update, expensive otherwise.check_sizecompares the remote object size to the local file size and re-downloads only on mismatch (does not validate contents).AWS credentials are picked up via the standard boto3 resolution chain (env vars, profile, IAM role). The bucket is publicly readable, so no credentials are usually required.