s3 — HCP S3 data download#

Schema: S3Config in src/thesis/core/config/validators.py.

Optional top-level block (s3: None by default) used by HCP-open-access bucket downloads. Lives under config/cloud.yaml in the default setup.

Field

Type

Default

Constraints

Description

enabled

bool

False

Enable automatic S3 data download for HCP workflows.

bucket

str

"hcp-openaccess"

S3 bucket name.

region

str

"us-east-1"

AWS region where the bucket lives.

prefix

str

"HCP_1200"

Bucket key prefix (folder path within the bucket).

cache_policy

str

"skip_if_exists"

one of skip_if_exists / check_size / always

Cache behaviour when files already exist locally.

max_retries

int

3

0

Maximum retry attempts for failed downloads.

retry_backoff

float

2.0

1.0

Exponential backoff multiplier between retries.

required_patterns

List[str]

see below

Glob patterns that must be downloaded successfully — failure aborts the run.

optional_patterns

List[str]

see below

Glob patterns to download if available — missing files are tolerated.

Default patterns#

required_patterns:
  - T1w/Diffusion/data.nii*
  - T1w/Diffusion/bvals
  - T1w/Diffusion/bvecs
  - T1w/Diffusion/nodif_brain_mask.nii*
  - T1w/Diffusion.bedpostX/merged_*samples.nii.gz
  - T1w/T1w_acpc_dc_restore_1.25.nii.gz

optional_patterns:
  - T1w/T1w_acpc_dc_restore_brain.nii.gz
  - T1w/brainmask_fs.nii.gz

Example#

s3:
  enabled: true
  bucket: hcp-openaccess
  region: us-east-1
  prefix: HCP_1200
  cache_policy: skip_if_exists
  max_retries: 5
  retry_backoff: 2.5

Notes#

  • cache_policy: always re-downloads every file — useful for forcing a refresh after a bucket update, expensive otherwise.

  • check_size compares the remote object size to the local file size and re-downloads only on mismatch (does not validate contents).

  • AWS credentials are picked up via the standard boto3 resolution chain (env vars, profile, IAM role). The bucket is publicly readable, so no credentials are usually required.