Large scale validation

Set up large scale validations using the helper tool duqduq.

This page documents the different subcommands available in this cli program.

To get started:

duqduq --help

For information on how to configure your UQ runs via duqtools.yaml, check out the usage page.

To start with large scale validation, two files are needed:

data.csv contains the template data
duqtools.template.yaml is the duqtools config template

Then, run the programs in the intended sequence:

duqduq setup
duqduq create
duqduq submit
duqduq status
duqduq merge

Each of these commands mimick the duqtools equivalent, for example, duqduq create is the large scale quivalent of duqtools create.

Input data

data.csv contains a list of IMAS handles pointing. For more info on this file, click here. duqduq setup will loop over the entries in this file, and create a new directory (named after the index column) in the current directory with input for duqtools.

data.csv

,user,db,shot,run
run_1,user,jet,12345,0002
run_2,user,jet,98760,0002
run_3,user,jet,2222,0002
run_4,user,jet,3333,0002
run_5,user,jet,4444,0001

Each column will be exposed through the handle dataclass in the config template below.

Config template

duqtools.template.yaml is a template for the duqtools create config. It contains a few placeholders for variable data (see the documentation for setup).

The template uses jinja2 as a templating language.

duqtools.template.yaml

tag: {{ run.name }}
create:
  runs_dir: /pfs/work/username/jetto_runs/duqduq/{{ run.name }}
  template: /pfs/work/username/jetto/runs/path/to/template/
  template_data:
    user: {{ handle.user }}
    db: {{ handle.db }}
    shot: {{ handle.shot }}
    run: {{ handle.run }}
  operations:
    - variable: major_radius
      operator: copyto
      value: {{ variables.major_radius | round(4) }}
    - variable: b_field
      operator: copyto
      value: {{ variables.b_field | round(4) }}
    - variable: t_start
      operator: copyto
      value: {{ variables.t_start | round(4) }}
    - variable: t_end
      operator: copyto
      value: {{ (variables.t_start + 0.01) | round(4) }}
  sampler:
    method: latin-hypercube
    n_samples: 3
  dimensions:
    - variable: zeff
      operator: add
      values: [0.01, 0.02, 0.03]
    - variable: t_e
      operator: multiply
      values: [0.8, 1.0, 1.2]
system:
  name: jetto

Split base and UQ directories

With duqtools you can generate a base run (no sampling), and use the results of the base run as the template for subsequent uq runs.

There are different ways this can be achieved. Below is an variation of the config above to show how this can be achieved using a single template. This uses the run.output attribute and jinja2 statements to control where to read the jetto template from.

duqtools.template.yaml

tag: {{ run.name }}
create:
  runs_dir: /pfs/work/username/jetto_runs/duqduq/{{ run.name }}
  {% if run.output == 'base' -%}
  template: /pfs/work/username/jetto/runs/path/to/template
  {% else -%}
  template: /pfs/work/username/jetto/runs/duqduq/{{ run.name }}/base
  {% endif -%}
  template_data:
    ...
  operations:
    ...
  sampler:
    ...
  dimensions:
    ...
system:
  name: jetto

Create and submit base runs

The first step is to setup, create and run the base runs. --no-sampling means that duqtools performs the runs with just the operations. Anything under dimensions is skipped. -p is a filter that tells duqtools where to load the instructions from.

duqduq setup --output base
duqduq create --no-sampling -p 'base/**'
duqduq submit -p 'base/**'
duqduq status -p 'base/**'

Create and submit UQ runs

Setup and perform the full UQ run.

duqduq setup --output uq
duqduq create -p 'uq/**'
duqduq submit -p 'uq/**'
duqduq status -p 'uq/**'

duqduq

For more information, check out the documentation:

https://duqtools.readthedocs.io/large_scale_validation

Usage:

duqduq [OPTIONS] COMMAND [ARGS]...

Options:

Name	Type	Description	Default
`--help`	boolean	Show this message and exit.	`False`

Subcommands

create: Create data sets for large scale validation.
merge: Merge data sets with error propagation.
setup: Set up large scale validation.
status: Check status large scale validation runs.
submit: Submit large scale validation runs.

duqduq create

Create data sets for large scale validation.

Example to only match config files in subdirectories matching jet*: duqduq create --pattern 'jet*/**'

Usage:

duqduq create [OPTIONS]

Options:

Name	Type	Description	Default
`--force`	boolean	Overwrite existing run directories and IDS data.	`False`
`-p`, `--pattern`	text	Only create data for configs in subdirectories matching this glob pattern.	None
`-i`, `--input`	text	Only create data for configs where `template_data` matches a handle in this data.csv.	None
`--no-sampling`	boolean	Create base runs (ignores `dimensions`/`sampler`).	`False`
`--dry-run`	boolean	Execute without any side-effects.	`False`
`--yes`	boolean	Answer yes to questions automatically.	`False`
`--debug`	boolean	Enable debug print statements.	`False`
`--logfile`, `-l`	text	where to send the logfile, the special values stderr/stdout will send it there respectively.	`duqtools.log`
`--help`	boolean	Show this message and exit.	`False`

duqduq merge

Merge data sets with error propagation.

By default, duqduq merge attempts to merge all known variables. Use --variable to select which variables to merge.

Usage:

duqduq merge [OPTIONS]

Options:

Name	Type	Description	Default
`--force`	boolean	Overwrite existing data	`False`
`-v`, `--variable`	text	Name of the variables.	None
`--dry-run`	boolean	Execute without any side-effects.	`False`
`--yes`	boolean	Answer yes to questions automatically.	`False`
`--debug`	boolean	Enable debug print statements.	`False`
`--logfile`, `-l`	text	where to send the logfile, the special values stderr/stdout will send it there respectively.	`duqtools.log`
`--help`	boolean	Show this message and exit.	`False`

duqduq setup

Set up large scale validation.

Usage:

duqduq setup [OPTIONS]

Options:

Name	Type	Description	Default
`-i`, `--input`	path	Input file, i.e. `data.csv` or `runs.yaml`	`data.csv`
`-t`, `--template`	path	Template duqtools.yaml	`duqtools.template.yaml`
`--force`	boolean	Overwrite existing run config directories	`False`
`-o`, `--output`	text	Output subdirectory	None
`--dry-run`	boolean	Execute without any side-effects.	`False`
`--yes`	boolean	Answer yes to questions automatically.	`False`
`--debug`	boolean	Enable debug print statements.	`False`
`--logfile`, `-l`	text	where to send the logfile, the special values stderr/stdout will send it there respectively.	`duqtools.log`
`--help`	boolean	Show this message and exit.	`False`

duqduq status

Check status large scale validation runs.

Usage:

duqduq status [OPTIONS]

Options:

Name	Type	Description	Default
`--detailed`	boolean	Detailed info on progress	`False`
`--progress`	boolean	Fancy progress bar	`False`
`-p`, `--pattern`	text	Show status only for runs in subdirectories matching this glob pattern.	None
`--debug`	boolean	Enable debug print statements.	`False`
`--logfile`, `-l`	text	where to send the logfile, the special values stderr/stdout will send it there respectively.	`duqtools.log`
`--help`	boolean	Show this message and exit.	`False`

duqduq submit

Submit large scale validation runs.

Usage:

duqduq submit [OPTIONS]

Options:

Name	Type	Description	Default
`--force`	boolean	Re-submit running or completed jobs.	`False`
`--schedule`	boolean	Schedule and submit jobs automatically.	`False`
`-j`, `--max_jobs`	integer	Maximum number of jobs running simultaneously.	`10`
`-s`, `--status`	text	Only submit jobs with this status.	None
`-p`, `--pattern`	text	Only submit jobs for runs in subdirectories matching this glob pattern.	None
`-i`, `--input`	text	Only submit jobs for configs where `template_data` matches a handle in this data.csv.	None
`-a`, `--array`	boolean	Submit jobs as array.	`False`
`--array-script`	boolean	Create script to submit jobs as array. Like --array, but does not submit.	`False`
`--limit`	integer	Limits total number of jobs to submit.	None
`--max_array_size`	integer	Maximum array size for slurm (usually 1001, default = 100).	`100`
`--dry-run`	boolean	Execute without any side-effects.	`False`
`--yes`	boolean	Answer yes to questions automatically.	`False`
`--debug`	boolean	Enable debug print statements.	`False`
`--logfile`, `-l`	text	where to send the logfile, the special values stderr/stdout will send it there respectively.	`duqtools.log`
`--help`	boolean	Show this message and exit.	`False`