Skip to content

Duqtools usage

A typical UQ run starts from a config file (duqtools.yaml), a template IMAS data set, and optionally a template simulation run (e.g. from jetto, see the systems page for more info).

  • duqtools setup (optional): For template-based run creation, see here for more information.
  • duqtools create: Creates the data files according to the rules specified in the config file. duqtools.yaml file. This is the main program in duqtools and what most of this page is about.
  • duqtools submit: Submits the runs to the runner. This is a helper tool to submit all runs to the scheduling system.
  • duqtools status: Display the status of the jobs in progress.
  • duqtools merge: Aggregates data sets into a single IMAS data file with error propagation.

duqtools.yaml

Duqtools run settings are configured using a yaml configuration file in the project directory. By default it is named duqtools.yaml. You can specify another path for it using the -c/--config option (see duqtools help or the cli).

As a minimum, this configuration file must define the root workspace and the system to use (see below). All other settings are (in principle) optional.

Example config file

To help initialize a starting config to modify, you can run duqtools init.

Below is an example config file generated by duqtools init.

duqtools.yaml
# Check out the documentation for more info:
# https://duqtools.readthedocs.io/en/latest/config/
tag: duqtools
create:
  runs_dir: ./runs_dir # change to output directory
  template_data: # change to your template data
    user: yourusername
    db: jet
    shot: 12345
    run: 1
  dimensions:
  - variable: t_e
    operator: multiply
    values: [0.9, 1.0, 1.1]
    scale_to_error: false
  - variable: zeff
    operator: multiply
    values: [0.9, 1.0, 1.1]
    scale_to_error: false
  sampler:
    method: latin-hypercube
    n_samples: 3
system:
  name: nosystem

Top level parameters

These are the top level keywords in the config. See the specific sections for more information.

tag
Create a tag for the runs to identify them in slurm or data.csv
create
Configuration for the create subcommand. See model for more info.
extra_variables
Specify extra variables for this run.
system
Options specific to the system used
quiet
If true, do not output to stdout, except for mandatory prompts.

The create config

The create section of the duqtools config is where you will be spending most of your time. It defines the location of the data, the run directory, the operations to apply, and the matrix sampling for UQ.

When you run duqtools create this section gets read and the steps executed.

Check out the command-line interface for more info on how to use this command.

create parameters

The options of the create subcommand are stored in the create key in the config.

runs_dir
Relative location from the workspace, which specifies the folder where to store all the created runs. This defaults to workspace/duqtools_experiment_x where x is a not yet existing integer.
template
Template directory to modify. Duqtools copies and updates the settings required for the specified system from this directory. This can be a directory with a finished run, or one just stored by JAMS (but not yet started). By default, duqtools extracts the input IMAS database entry from the settings file (e.g. jetto.in) to find the data to modify for the UQ runs. Defaults to None.
template_data
Specify the location of the template data to modify. This overrides the location of the data specified in settings file in the template directory.
operations

These operations are always applied to the data. All operations specified here are added to any operations sampled from the dimensions. They can be used to, for example, set the start time for an experiment or update some physical parameters. This parameter is optional.

sampler
For efficient UQ, it may not be necessary to sample the entire matrix or hypercube. By default, the cartesian product is taken (method: cartesian-product). For more efficient sampling of the space, the following method choices are available: latin-hypercube, sobol, halton. Where n_samples gives the number of samples to extract.
dimensions
The dimensions specifies the dimensions of the matrix to sample from. Each dimension is a compound set of operations to apply. From this, a matrix all possible combinations is generated. Essentially, it generates the Cartesian product of all operations. By specifying a different sampler, a subset of this hypercube can be efficiently sampled. This paramater is optional.

For example:

duqtools.yaml
create:
  runs_dir: /pfs/work/username/jetto/runs/run_1
  template: /pfs/work/username/jetto/runs/duqtools_template
  operations:
    - variable: t_start
      operator: copyto
      value: 2.875
    - variable: t_end
      operator: copyto
      value: 2.885
  dimensions:
    - variable: t_e
      operator: multiply
      values: [0.9, 1.0, 1.1]
      scale_to_error: false
    - variable: zeff
      operator: multiply
      values: [0.9, 1.0, 1.1]
      scale_to_error: false
  sampler:
    method: latin-hypercube
    n_samples: 3

Output directory

You can modify the duqtools output directory via runs_dir:

duqtools.yaml
create:
  runs_dir: my_experiment

Specify the template data

Duqtools distinguishes between:

  1. template_data (mandatory), this is the source data that is copied and then modified by duqtools.
  2. template (optional), this is the location where the config data are stored for your run. The contents of this directory are first copied to the target location.

By default, for the jetto, the template IMAS data to modify is extracted from the path specified in the template field.

duqtools.yaml
create:
  template: /pfs/work/username/jetto/runs/duqtools_template

In some cases, it may be useful to re-use the same set of model settings, but with different input data. If the template_data field is specified, these data will be used instead. To do so, specify template_data with the fields below:

relative_location
Set as the relative location to the imasdb location if a local imasdb is used
user
Username.
db
IMAS db/machine name.
shot
IMAS Shot number.
run
IMAS Run number.

For example:

duqtools.yaml
template: /pfs/work/username/jetto/runs/duqtools_template
template_data:
  user: username
  db: jet
  shot: 91234
  run: 5

Samplers

Depending on the number of dimensions, a hypercube is constructed from which duqtools will select a number of entries. For a setup with 3 dimension of size \(i\), \(j\), \(k\), a hypercube of \(i\times j\times k\) elements will e constructed, where each element is a one of the combinations.

By default the entire hypercube is sampled:

duqtools.yaml
sampler:
  method: cartesian-product

For smarter sampling, use one of the other methods: latin-hypercube, sobol, or halton. n_samples gives the number of samples to extract. For example:

duqtools.yaml
sampler:
  method: latin-hypercube
  n_samples: 5

Dimensions

These instructions operate on the template model. Note that these are compound operations, so they are expanded to fill the matrix with possible entries for data modifications (depending on the sampling method).

Arithmetic operations

Apply set of arithmetic operations to IDS.

Takes the IDS data and subtracts, adds, multiplies, etc with each the given values.

values
Values to use with operator on field to create sampling space.
operator
Which operator to apply to the data in combination with any of the given values below. This can be any of the basic numpy arithmetic operations. Available choices: add, multiply, divide, power, subtract, floor_divide, mod, none and remainder. These directly map to the equivalent numpy functions, i.e. add -> np.add.
scale_to_error
If True, multiply value(s) by the error (sigma). With asymmetric errors (i.e. both lower/upper error nodes are available), scale to the lower error node for values < 0, and to the upper error node for values > 0.
clip_min
If set, clip (limit) data at this value (upper bound). Uses np.clip.
clip_max
If set, clip (limit) data at this value (lower bound). Uses np.clip.
linear_ramp
Linearly ramp the operation using the start and stop value given. The first value (start) corresponds to multiplier at the beginning of the data range, the second value (stop) to the multiplier at the end. The ramp is linearly interpolated between the start and stop values. The linear ramp acts as a multiplier of the specified value. For example, for operator: add: new_data = data + np.linspace(start, stop, len(data)) * value
input_variables
Input variables that should be present for a operator: custom operation. The values of this input variable can be used in the custom_code field.
custom_code
Custom python code to apply for the custom operator. This will be evaluated as if it were inline Python code. Two variables are accessible: data corresponds to the variable data, and value corresponds to pass value. The extra input_variables are defined in a dict named var = { variable1 : value, variable2 : value} For example, an implementation of operator: multiply: custom_code: 'value * data' Or an example of multiplying some input_variable named key1: custom_code:var['key1']*value` The resulting data must be of the same shape.
variable
IDS variable for the data to modify. The time slice can be denoted with '*', this will match all time slices in the IDS. Alternatively, you can specify the time slice directly, i.e. profiles_1d/0/t_i_ave to only match and update the 0-th time slice.

For example:

duqtools.yaml
variable: zeff
operator: add
values: [0.01, 0.02, 0.03]

will generate 3 entries, zeff += 0.01, zeff += 0.02, and zeff += 0.03.

duqtools.yaml
variable: t_i_ave
operator: multiply
values: [1.1, 1.2, 1.3]

will generate another 3 entries, t_i_ave *= 1.1, t_i_ave *= 1.2, and t_i_ave *= 1.3.

With these 2 entries, the parameter hypercube would consist of 9 entries total (3 for zeff times 3 for t_i_ave). With the default sampler: latin-hypercube, this means 9 new data files will be written.

Note

The python equivalent is essentially np.<operator>(ids, value, out=ids) for each of the given values.

Note

If you want to copy all time ranges, you can use path: profiles_1d/*/t_i_ave. The * substring will duqtools to apply the operation to all available time slices.

Clipping profiles

Values can be clipped to a lower or upper bound by specifying clip_min or clip_max. This can be helpful to guard against unphysical values. The example below will clip the profile for Zeff at 1 (lower bound):

variable: zeff
operator: multiply
values: [0.8, 0.9, 1.0, 1.1, 1.2]
clip_min: 1

Linear ramps

Before applying the operator, the given value can be ramped along the horizontal axis (rho) by specifying the linear_ramp keyword.

The two values represent the start and stop value of a linear ramp. For each value in values, the data at \(\rho = 0\) are multiplied by 1 * value, data at \(\rho = 1\) are multiplied by 2 * value. All values inbetween get multiplied based on a linear interpolation betwen those 2 values.

variable: t_e
operator: multiply
values: [0.8, 1.0, 1.2]
linear_ramp: [1, 2]

Custom functions

If the standard operators are not suitable for your use-case, you can define your own functions using the custom operator.

This can be any custom Python code. Two variables are accessible. data corresponds to the variable data, and value to one of the specified values in the values field. The only restriction is that the output of the code must have the same dimensions as the input.

The example shows an implementation of operator: multiply with lower and upper bounds using a custom function.

variable: t_e
operator: custom
values: [0.8, 1.0, 1.2]
custom_code: 'np.clip(data * value, a_min=0, a_max=100)'

Operations

Operations are similar to dimensions, with a small difference. Operations are always applied to the data and not sampled. Therefore they take a single value instead of an array.

For example, let's say you want to modify the start and end times of your jetto run:

create:
  operations:
    - variable: t_start
      operator: copyto
      value: 1.00
    - variable: t_end
      operator: copyto
      value: 2.00

More about variables

To specify additional variables, you can use the extra_variables lookup file (See below). The examples will use the name attribute to look up the location of the data. For example, variable: zeff will refer to the entry with name: zeff.

For more info about variables, see here.

Value ranges

Although it is possible to specify value ranges explicitly in an operator, sometimes it may be easier to specify a range.

There are two ways to specify ranges in duqtools.

By number of samples

Generated evenly spaced numbers over a specified interval.

See the implementation of numpy.linspace for more details.

start
Start value of the sequence.
stop
End value of the sequence.
num
Number of samples to generate.

This example generates a range from 0.7 to 1.3 with 10 steps:

duqtools.yaml
variable: t_i_ave
operator: multiply
values:
  start: 0.7
  stop: 1.3
  num: 10
By stepsize

Generate evenly spaced numbers within a given interval.

See the implementation of numpy.arange for more details.

start
Start of the interval. Includes this value.
stop
End of the interval. Excludes this interval.
step
Spacing between values.

This example generates a range from 0.7 to 1.3 with steps of 0.1:

duqtools.yaml
variable: t_i_ave
operator: multiply
values:
  start: 0.7
  stop: 1.3
  step: 0.1

Sampling between error bounds

From the data model convention, only the upper error node (_error_upper) should be filled in case of symmetrical error bars. If the lower error node (_error_lower) is also filled, duqtools will scale to the upper error for values larger than 0, and to the lower error for values smaller than 0.

The following example takes t_e, and generates a range from \(-2\sigma\) to \(+2\sigma\) with defined steps:

duqtools.yaml
variable: t_e
operator: add
values: [-2, -1, 0, 1, 2]
scale_to_error: True

The following example takes t_i_ave, and generates a range from \(-3\sigma\) to \(+3\sigma\) with 10 equivalent steps:

duqtools.yaml
variable: t_i_ave
operator: add
values:
  start: -3
  stop: 3
  num: 10
scale_to_error: True

Note

When you specify a sigma range, make sure you use add as the operator. While the other operators are also supported, they do not make much sense in this context.

Coupling Variables

It is possible to couple the sampling of two variables, simply add them as a single List entry to the configurations file:

duqtools.yaml
-  - variable: t_start
     operator: copyto
     values: [0.1, 0.2, 0.3]
   - variable: t_end
     operator: copyto
     values: [1.1, 1.2, 1.3]

The system config

Currently there are multiple systems available. They are distinguished by specifying the system field.

Currently there are two options available:

Options:

Default (no system)

This system is intended for workflows that need to apply some operations or sampling of the data without any system.

With this system, you won't have to specify create.template. Only create.template_data is required.

duqtools.yaml
system:
  name: 'nosystem'  # or `name: None`
Source code in duqtools/systems/base_system.py
19
20
21
def __init__(self, cfg: Config):
    self.cfg = cfg
    self.options = cfg.system

Extra variables

Duqtools comes with a list of default variables. You can update or add your own variables via the extra_variables key in the duqtools.yaml file.

IDS variables

Variable for describing data within a IMAS database.

The variable can be given a name, which will be used in the rest of the config to reference the variable. It will also be used as the column labels or on plots.

The dimensions for each variable must be specified. This ensures the the data will be self-consistent. For example for 1D data, you can use [x] and for 2D data, [x, y].

The IDS path may contain indices. You can point to a single index, by simply giving the complete path (i.e. profiles_1d/0/t_i_ave for the 0th time slice). To retrieve all time slices, you can use profiles_1d/*/t_i_ave.

ids
Root IDS name.
path
Path to the data within the IDS. The fields are separated by forward slashes (/).
type
discriminator for the variable type
name
Name of the variable. This will be used to reference this variable.
dims
Give the dimensions of the data, i.e. [x] for 1D, or [x, y] for 2D data.

Example:

duqtools.yaml
extra_variables:
- name: rho_tor_norm
  ids: core_profiles
  path: profiles_1d/*/grid/rho_tor_norm
  dims: [time, x]
  type: IDS-variable
- name: t_i_ave
  ids: core_profiles
  path: profiles_1d/*/t_i_ave
  dims: [time, x]
  type: IDS-variable

Using other variables as input

It is possible to specify other variables to use as input for your operation. This can be used to calculate a value of a variable with a custom operation which includes these variables. These variables are available in the custom_code in a SimpleNamespace as var.variable name.

The example below sets all t_i_ave to some value calculated by dividing t_i_ave_0 by rho_tor_norm_0

extra_variables:
- name: rho_tor_norm_0
  ids: core_profiles
  path: profiles_1d/0/grid/rho_tor_norm
  dims: [x]
  type: IDS-variable
- name: t_i_ave_0
  ids: core_profiles
  path: profiles_1d/0/t_i_ave
  dims: [x]
  type: IDS-variable
create:
  dimensions:
    variable: t_i_ave
    operator: custom
    values: [1.0]
    input_variables:
      - "t_i_ave_0"
      - "rho_tor_norm_0"
    custom_code: 'var.t_i_ave_0/var.rho_tor_norm_0'

Note

  • If a variable that has been operated on earlier is specified as input, it will probably be the new value.
  • input_variables must not have multiple dimensions (so for IDS, no * operator in the path is allowed.