Skip to content

Large scale validation

Set up large scale validations using the helper tool duqduq.

This page documents the different subcommands available in this cli program.

To get started:

duqduq --help

For information on how to configure your UQ runs via duqtools.yaml, check out the usage page.

To start with large scale validation, two files are needed:

  1. data.csv contains the template data
  2. duqtools.template.yaml is the duqtools config template

Then, run the programs in the intended sequence:

  1. duqduq setup
  2. duqduq create
  3. duqduq submit
  4. duqduq status
  5. duqduq merge

Each of these commands mimick the duqtools equivalent, for example, duqduq create is the large scale quivalent of duqtools create.

Input data

data.csv contains a list of IMAS handles pointing. For more info on this file, click here. duqduq setup will loop over the entries in this file, and create a new directory (named after the index column) in the current directory with input for duqtools.

data.csv
,user,db,shot,run
run_1,user,jet,12345,0002
run_2,user,jet,98760,0002
run_3,user,jet,2222,0002
run_4,user,jet,3333,0002
run_5,user,jet,4444,0001

Each column will be exposed through the handle dataclass in the config template below.

Config template

duqtools.template.yaml is a template for the duqtools create config. It contains a few placeholders for variable data (see the documentation for setup).

The template uses jinja2 as a templating language.

duqtools.template.yaml
tag: {{ run.name }}
create:
  runs_dir: /pfs/work/username/jetto_runs/duqduq/{{ run.name }}
  template: /pfs/work/username/jetto/runs/path/to/template/
  template_data:
    user: {{ handle.user }}
    db: {{ handle.db }}
    shot: {{ handle.shot }}
    run: {{ handle.run }}
  operations:
    - variable: major_radius
      operator: copyto
      value: {{ variables.major_radius | round(4) }}
    - variable: b_field
      operator: copyto
      value: {{ variables.b_field | round(4) }}
    - variable: t_start
      operator: copyto
      value: {{ variables.t_start | round(4) }}
    - variable: t_end
      operator: copyto
      value: {{ (variables.t_start + 0.01) | round(4) }}
  sampler:
    method: latin-hypercube
    n_samples: 3
  dimensions:
    - variable: zeff
      operator: add
      values: [0.01, 0.02, 0.03]
    - variable: t_e
      operator: multiply
      values: [0.8, 1.0, 1.2]
system:
  name: jetto

Split base and UQ directories

With duqtools you can generate a base run (no sampling), and use the results of the base run as the template for subsequent uq runs.

There are different ways this can be achieved. Below is an variation of the config above to show how this can be achieved using a single template. This uses the run.output attribute and jinja2 statements to control where to read the jetto template from.

duqtools.template.yaml
tag: {{ run.name }}
create:
  runs_dir: /pfs/work/username/jetto_runs/duqduq/{{ run.name }}
  {% if run.output == 'base' -%}
  template: /pfs/work/username/jetto/runs/path/to/template
  {% else -%}
  template: /pfs/work/username/jetto/runs/duqduq/{{ run.name }}/base
  {% endif -%}
  template_data:
    ...
  operations:
    ...
  sampler:
    ...
  dimensions:
    ...
system:
  name: jetto

Create and submit base runs

The first step is to setup, create and run the base runs. --no-sampling means that duqtools performs the runs with just the operations. Anything under dimensions is skipped. -p is a filter that tells duqtools where to load the instructions from.

duqduq setup --output base
duqduq create --no-sampling -p 'base/**'
duqduq submit -p 'base/**'
duqduq status -p 'base/**'

Create and submit UQ runs

Setup and perform the full UQ run.

duqduq setup --output uq
duqduq create -p 'uq/**'
duqduq submit -p 'uq/**'
duqduq status -p 'uq/**'

duqduq

For more information, check out the documentation:

https://duqtools.readthedocs.io/large_scale_validation

Usage:

duqduq [OPTIONS] COMMAND [ARGS]...

Options:

Name Type Description Default
--help boolean Show this message and exit. False

Subcommands

  • create: Create data sets for large scale validation.
  • merge: Merge data sets with error propagation.
  • setup: Set up large scale validation.
  • status: Check status large scale validation runs.
  • submit: Submit large scale validation runs.

duqduq create

Create data sets for large scale validation.

Example to only match config files in subdirectories matching jet*: duqduq create --pattern 'jet*/**'

Usage:

duqduq create [OPTIONS]

Options:

Name Type Description Default
--force boolean Overwrite existing run directories and IDS data. False
-p, --pattern text Only create data for configs in subdirectories matching this glob pattern. None
-i, --input text Only create data for configs where template_data matches a handle in this data.csv. None
--no-sampling boolean Create base runs (ignores dimensions/sampler). False
--dry-run boolean Execute without any side-effects. False
--yes boolean Answer yes to questions automatically. False
--debug boolean Enable debug print statements. False
--logfile, -l text where to send the logfile, the special values stderr/stdout will send it there respectively. duqtools.log
--help boolean Show this message and exit. False

duqduq merge

Merge data sets with error propagation.

By default, duqduq merge attempts to merge all known variables. Use --variable to select which variables to merge.

Usage:

duqduq merge [OPTIONS]

Options:

Name Type Description Default
--force boolean Overwrite existing data False
-v, --variable text Name of the variables. None
--dry-run boolean Execute without any side-effects. False
--yes boolean Answer yes to questions automatically. False
--debug boolean Enable debug print statements. False
--logfile, -l text where to send the logfile, the special values stderr/stdout will send it there respectively. duqtools.log
--help boolean Show this message and exit. False

duqduq setup

Set up large scale validation.

Usage:

duqduq setup [OPTIONS]

Options:

Name Type Description Default
-i, --input path Input file, i.e. data.csv or runs.yaml data.csv
-t, --template path Template duqtools.yaml duqtools.template.yaml
--force boolean Overwrite existing run config directories False
-o, --output text Output subdirectory None
--dry-run boolean Execute without any side-effects. False
--yes boolean Answer yes to questions automatically. False
--debug boolean Enable debug print statements. False
--logfile, -l text where to send the logfile, the special values stderr/stdout will send it there respectively. duqtools.log
--help boolean Show this message and exit. False

duqduq status

Check status large scale validation runs.

Usage:

duqduq status [OPTIONS]

Options:

Name Type Description Default
--detailed boolean Detailed info on progress False
--progress boolean Fancy progress bar False
-p, --pattern text Show status only for runs in subdirectories matching this glob pattern. None
--debug boolean Enable debug print statements. False
--logfile, -l text where to send the logfile, the special values stderr/stdout will send it there respectively. duqtools.log
--help boolean Show this message and exit. False

duqduq submit

Submit large scale validation runs.

Usage:

duqduq submit [OPTIONS]

Options:

Name Type Description Default
--force boolean Re-submit running or completed jobs. False
--schedule boolean Schedule and submit jobs automatically. False
-j, --max_jobs integer Maximum number of jobs running simultaneously. 10
-s, --status text Only submit jobs with this status. None
-p, --pattern text Only submit jobs for runs in subdirectories matching this glob pattern. None
-i, --input text Only submit jobs for configs where template_data matches a handle in this data.csv. None
-a, --array boolean Submit jobs as array. False
--array-script boolean Create script to submit jobs as array. Like --array, but does not submit. False
--limit integer Limits total number of jobs to submit. None
--max_array_size integer Maximum array size for slurm (usually 1001, default = 100). 100
--dry-run boolean Execute without any side-effects. False
--yes boolean Answer yes to questions automatically. False
--debug boolean Enable debug print statements. False
--logfile, -l text where to send the logfile, the special values stderr/stdout will send it there respectively. duqtools.log
--help boolean Show this message and exit. False