Duqtools usage
A typical UQ run starts from a config file (duqtools.yaml
), a template IMAS data set, and optionally a template simulation run (e.g. from jetto, see the systems page for more info).
duqtools setup
(optional): For template-based run creation, see here for more information.duqtools create
: Creates the data files according to the rules specified in the config file.duqtools.yaml
file. This is the main program in duqtools and what most of this page is about.duqtools submit
: Submits the runs to the runner. This is a helper tool to submit all runs to the scheduling system.duqtools status
: Display the status of the jobs in progress.duqtools merge
: Aggregates data sets into a single IMAS data file with error propagation.
duqtools.yaml
Duqtools run settings are configured using a yaml configuration file in the project directory. By default it is named duqtools.yaml
. You can specify another path for it using the -c/--config
option (see duqtools help
or the cli).
As a minimum, this configuration file must define the root workspace and the system to use (see below). All other settings are (in principle) optional.
Example config file
To help initialize a starting config to modify, you can run duqtools init
.
Below is an example config file generated by duqtools init
.
# Check out the documentation for more info:
# https://duqtools.readthedocs.io/en/latest/config/
tag: duqtools
create:
runs_dir: ./runs_dir # change to output directory
template_data: # change to your template data
user: yourusername
db: jet
shot: 12345
run: 1
dimensions:
- variable: t_e
operator: multiply
values: [0.9, 1.0, 1.1]
scale_to_error: false
- variable: zeff
operator: multiply
values: [0.9, 1.0, 1.1]
scale_to_error: false
sampler:
method: latin-hypercube
n_samples: 3
system:
name: nosystem
Top level parameters
These are the top level keywords in the config. See the specific sections for more information.
tag
- Create a tag for the runs to identify them in slurm or
data.csv
create
- Configuration for the create subcommand. See model for more info.
extra_variables
- Specify extra variables for this run.
system
- Options specific to the system used
quiet
- If true, do not output to stdout, except for mandatory prompts.
The create
config
The create
section of the duqtools config is where you will be spending most of your time. It defines the location of the data, the run directory, the operations to apply, and the matrix sampling for UQ.
When you run duqtools create
this section gets read and the steps executed.
Check out the command-line interface for more info on how to use this command.
create
parameters
The options of the create
subcommand are stored in the create
key in
the config.
runs_dir
- Relative location from the workspace, which specifies the folder where to store all the created runs. This defaults to
workspace/duqtools_experiment_x
wherex
is a not yet existing integer. template
- Template directory to modify. Duqtools copies and updates the settings required for the specified system from this directory. This can be a directory with a finished run, or one just stored by JAMS (but not yet started). By default, duqtools extracts the input IMAS database entry from the settings file (e.g. jetto.in) to find the data to modify for the UQ runs. Defaults to None.
template_data
- Specify the location of the template data to modify. This overrides the location of the data specified in settings file in the template directory.
operations
-
These
operations
are always applied to the data. All operations specified here are added to any operations sampled from the dimensions. They can be used to, for example, set the start time for an experiment or update some physical parameters. This parameter is optional. sampler
- For efficient UQ, it may not be necessary to sample the entire matrix or hypercube. By default, the cartesian product is taken (
method: cartesian-product
). For more efficient sampling of the space, the followingmethod
choices are available:latin-hypercube
,sobol
,halton
. Wheren_samples
gives the number of samples to extract. dimensions
- The
dimensions
specifies the dimensions of the matrix to sample from. Each dimension is a compound set of operations to apply. From this, a matrix all possible combinations is generated. Essentially, it generates the Cartesian product of all operations. By specifying a differentsampler
, a subset of this hypercube can be efficiently sampled. This paramater is optional.
For example:
create:
runs_dir: /pfs/work/username/jetto/runs/run_1
template: /pfs/work/username/jetto/runs/duqtools_template
operations:
- variable: t_start
operator: copyto
value: 2.875
- variable: t_end
operator: copyto
value: 2.885
dimensions:
- variable: t_e
operator: multiply
values: [0.9, 1.0, 1.1]
scale_to_error: false
- variable: zeff
operator: multiply
values: [0.9, 1.0, 1.1]
scale_to_error: false
sampler:
method: latin-hypercube
n_samples: 3
Output directory
You can modify the duqtools output directory via runs_dir
:
create:
runs_dir: my_experiment
Specify the template data
Duqtools distinguishes between:
template_data
(mandatory), this is the source data that is copied and then modified by duqtools.template
(optional), this is the location where the config data are stored for your run. The contents of this directory are first copied to the target location.
By default, for the jetto, the template IMAS data to modify is extracted from the path specified in the template
field.
create:
template: /pfs/work/username/jetto/runs/duqtools_template
In some cases, it may be useful to re-use the same set of model settings, but with different input data. If the template_data
field is specified, these data will be used instead. To do so, specify template_data
with the fields below:
relative_location
- Set as the relative location to the imasdb location if a local imasdb is used
user
- Username.
db
- IMAS db/machine name.
shot
- IMAS Shot number.
run
- IMAS Run number.
For example:
template: /pfs/work/username/jetto/runs/duqtools_template
template_data:
user: username
db: jet
shot: 91234
run: 5
Samplers
Depending on the number of dimensions, a hypercube is constructed from which duqtools will select a number of entries. For a setup with 3 dimension of size \(i\), \(j\), \(k\), a hypercube of \(i\times j\times k\) elements will e constructed, where each element is a one of the combinations.
By default the entire hypercube is sampled:
sampler:
method: cartesian-product
For smarter sampling, use one of the other methods: latin-hypercube
, sobol
, or halton
.
n_samples
gives the number of samples to extract. For example:
sampler:
method: latin-hypercube
n_samples: 5
Dimensions
These instructions operate on the template model. Note that these are compound operations, so they are expanded to fill the matrix with possible entries for data modifications (depending on the sampling method).
Arithmetic operations
Apply set of arithmetic operations to IDS.
Takes the IDS data and subtracts, adds, multiplies, etc with each the given values.
values
- Values to use with operator on field to create sampling space.
operator
- Which operator to apply to the data in combination with any of the given values below. This can be any of the basic numpy arithmetic operations. Available choices:
add
,multiply
,divide
,power
,subtract
,floor_divide
,mod
,none
andremainder
. These directly map to the equivalent numpy functions, i.e.add
->np.add
. scale_to_error
- If True, multiply value(s) by the error (sigma). With asymmetric errors (i.e. both lower/upper error nodes are available), scale to the lower error node for values < 0, and to the upper error node for values > 0.
clip_min
- If set, clip (limit) data at this value (upper bound). Uses
np.clip
. clip_max
- If set, clip (limit) data at this value (lower bound). Uses
np.clip
. linear_ramp
- Linearly ramp the operation using the start and stop value given. The first value (start) corresponds to multiplier at the beginning of the data range, the second value (stop) to the multiplier at the end. The ramp is linearly interpolated between the start and stop values. The linear ramp acts as a multiplier of the specified
value
. For example, foroperator: add
:new_data = data + np.linspace(start, stop, len(data)) * value
input_variables
- Input variables that should be present for a
operator: custom
operation. The values of this input variable can be used in thecustom_code
field. custom_code
- Custom python code to apply for the
custom
operator. This will be evaluated as if it were inline Python code. Two variables are accessible:data
corresponds to the variable data, andvalue
corresponds to pass value. The extra input_variables are defined in a dict namedvar = { variable1 : value, variable2 : value}
For example, an implementation ofoperator: multiply
:custom_code: 'value * data'
Or an example of multiplying some input_variable namedkey1
:custom_code:
var['key1']*value` The resulting data must be of the same shape. variable
- IDS variable for the data to modify. The time slice can be denoted with '*', this will match all time slices in the IDS. Alternatively, you can specify the time slice directly, i.e.
profiles_1d/0/t_i_ave
to only match and update the 0-th time slice.
For example:
variable: zeff
operator: add
values: [0.01, 0.02, 0.03]
will generate 3 entries, zeff += 0.01
, zeff += 0.02
, and zeff += 0.03
.
variable: t_i_ave
operator: multiply
values: [1.1, 1.2, 1.3]
will generate another 3 entries, t_i_ave *= 1.1
, t_i_ave *= 1.2
, and t_i_ave *= 1.3
.
With these 2 entries, the parameter hypercube would consist of 9 entries total (3 for zeff
times 3 for t_i_ave
).
With the default sampler: latin-hypercube
, this means 9 new data files will be written.
Note
The python equivalent is essentially np.<operator>(ids, value, out=ids)
for each of the given values.
Note
If you want to copy all time ranges, you can use path: profiles_1d/*/t_i_ave
. The *
substring will
duqtools to apply the operation to all available time slices.
Clipping profiles
Values can be clipped to a lower or upper bound by specifying clip_min
or clip_max
. This can be helpful to guard against unphysical values. The example below will clip the profile for Zeff at 1 (lower bound):
variable: zeff
operator: multiply
values: [0.8, 0.9, 1.0, 1.1, 1.2]
clip_min: 1
Linear ramps
Before applying the operator, the given value can be ramped along the horizontal axis (rho) by specifying the linear_ramp
keyword.
The two values represent the start and stop value of a linear ramp. For each value in values
, the data at \(\rho = 0\) are multiplied by 1 * value
, data at \(\rho = 1\) are multiplied by 2 * value
. All values inbetween get multiplied based on a linear interpolation betwen those 2 values.
variable: t_e
operator: multiply
values: [0.8, 1.0, 1.2]
linear_ramp: [1, 2]
Custom functions
If the standard operators are not suitable for your use-case, you can define your own functions using the custom
operator.
This can be any custom Python code. Two variables are accessible. data
corresponds to the variable data, and value
to one of the specified values in the values
field. The only restriction is that the output of the code must have the same dimensions as the input.
The example shows an implementation of operator: multiply
with lower and upper bounds using a custom function.
variable: t_e
operator: custom
values: [0.8, 1.0, 1.2]
custom_code: 'np.clip(data * value, a_min=0, a_max=100)'
Operations
Operations are similar to dimensions, with a small difference. Operations are always applied to the data and not sampled. Therefore they take a single value instead of an array.
For example, let's say you want to modify the start and end times of your jetto run:
create:
operations:
- variable: t_start
operator: copyto
value: 1.00
- variable: t_end
operator: copyto
value: 2.00
More about variables
To specify additional variables, you can use the extra_variables
lookup file (See below). The examples will use the name
attribute to look up the location of the data. For example, variable: zeff
will refer to the entry with name: zeff
.
For more info about variables, see here.
Value ranges
Although it is possible to specify value ranges explicitly in an operator, sometimes it may be easier to specify a range.
There are two ways to specify ranges in duqtools.
By number of samples
Generated evenly spaced numbers over a specified interval.
See the implementation of numpy.linspace for more details.
start
- Start value of the sequence.
stop
- End value of the sequence.
num
- Number of samples to generate.
This example generates a range from 0.7 to 1.3 with 10 steps:
variable: t_i_ave
operator: multiply
values:
start: 0.7
stop: 1.3
num: 10
By stepsize
Generate evenly spaced numbers within a given interval.
See the implementation of numpy.arange for more details.
start
- Start of the interval. Includes this value.
stop
- End of the interval. Excludes this interval.
step
- Spacing between values.
This example generates a range from 0.7 to 1.3 with steps of 0.1:
variable: t_i_ave
operator: multiply
values:
start: 0.7
stop: 1.3
step: 0.1
Sampling between error bounds
From the data model convention, only the upper error node (_error_upper
) should be filled in case of symmetrical error bars. If the lower error node (_error_lower
) is also filled, duqtools will scale to the upper error for values larger than 0, and to the lower error for values smaller than 0.
The following example takes t_e
, and generates a range from \(-2\sigma\) to \(+2\sigma\) with defined steps:
variable: t_e
operator: add
values: [-2, -1, 0, 1, 2]
scale_to_error: True
The following example takes t_i_ave
, and generates a range from \(-3\sigma\) to \(+3\sigma\) with 10 equivalent steps:
variable: t_i_ave
operator: add
values:
start: -3
stop: 3
num: 10
scale_to_error: True
Note
When you specify a sigma range, make sure you use add
as the operator. While the other operators are also supported, they do not make much sense in this context.
Coupling Variables
It is possible to couple the sampling of two variables, simply add them as a single List
entry to the configurations file:
- - variable: t_start
operator: copyto
values: [0.1, 0.2, 0.3]
- variable: t_end
operator: copyto
values: [1.1, 1.2, 1.3]
The system config
Currently there are multiple systems available. They are distinguished by specifying the system
field.
Currently there are two options available:
Options:
None
ornosystem
(default)jetto
(see jetto specific documentation)
Default (no system)
This system is intended for workflows that need to apply some operations or sampling of the data without any system.
With this system, you won't have to specify create.template
. Only
create.template_data
is required.
system:
name: 'nosystem' # or `name: None`
Source code in duqtools/systems/base_system.py
19 20 21 |
|
Extra variables
Duqtools comes with a list of default variables. You can update or add your own variables via the extra_variables
key in the duqtools.yaml
file.
IDS variables
Variable for describing data within a IMAS database.
The variable can be given a name, which will be used in the rest of the config to reference the variable. It will also be used as the column labels or on plots.
The dimensions for each variable must be specified. This ensures the
the data will be self-consistent. For example for 1D data, you can
use [x]
and for 2D data, [x, y]
.
The IDS path may contain indices. You can point to a single index,
by simply giving the complete path (i.e. profiles_1d/0/t_i_ave
for
the 0th time slice). To retrieve all time slices, you can use
profiles_1d/*/t_i_ave
.
ids
- Root IDS name.
path
- Path to the data within the IDS. The fields are separated by forward slashes (
/
). type
- discriminator for the variable type
name
- Name of the variable. This will be used to reference this variable.
dims
- Give the dimensions of the data, i.e. [x] for 1D, or [x, y] for 2D data.
Example:
extra_variables:
- name: rho_tor_norm
ids: core_profiles
path: profiles_1d/*/grid/rho_tor_norm
dims: [time, x]
type: IDS-variable
- name: t_i_ave
ids: core_profiles
path: profiles_1d/*/t_i_ave
dims: [time, x]
type: IDS-variable
Using other variables as input
It is possible to specify other variables to use as input for your operation. This can be used to calculate a value of a variable with a custom
operation which includes these variables. These variables are available in the custom_code
in a SimpleNamespace as var.variable name
.
The example below sets all t_i_ave
to some value calculated by dividing t_i_ave_0
by rho_tor_norm_0
extra_variables:
- name: rho_tor_norm_0
ids: core_profiles
path: profiles_1d/0/grid/rho_tor_norm
dims: [x]
type: IDS-variable
- name: t_i_ave_0
ids: core_profiles
path: profiles_1d/0/t_i_ave
dims: [x]
type: IDS-variable
create:
dimensions:
variable: t_i_ave
operator: custom
values: [1.0]
input_variables:
- "t_i_ave_0"
- "rho_tor_norm_0"
custom_code: 'var.t_i_ave_0/var.rho_tor_norm_0'
Note
- If a variable that has been operated on earlier is specified as input, it will probably be the new value.
input_variables
must not have multiple dimensions (so for IDS, no*
operator in the path is allowed.