Duqduq demo: large scale validation
This notebook shows how to use duqtools to large scale validation.
It will go over the steps required to do uncertainty quantification from a sequence of data sets.
Where duqtools does UQ for a single data set, duqduq loops over multiple datasets to do UQ in sequence.
We define 2 directories:
- duqduq directory, this is where the duqtools and UQ config resides. This is also the directory we work in with duqduq.
- run directory, this is a directory where slurm has access and where all the simulation files and data are stored.
from pathlib import Path
duqtools_dir = Path('/afs/eufus.eu/user/g/g2ssmee/duqduq_demo')
duqtools_dir_done = Path('/afs/eufus.eu/user/g/g2ssmee/duqduq_demo_done')
run_dir = Path('/afs/eufus.eu/user/g/g2ssmee/jetto_runs/duqduq_long')
import os
os.chdir(duqtools_dir)
duqduq help
The main interface for duqduq is via the CLI. You can run duqduq --help to give a list of available subcommands.
You will notice that the subcommands here mimic what is available in duqtools.
!duqduq --help
duqduq setup
The starting point for duqduq is 2 files:
duqtools.template.yaml, this is the template config thatduqduq setupwill use to generate theduqtools.yamldata.csv, each entry in this csv file corresponds to an IMAS data set
Below is an example data.csv file. This is how you tell duqduq which data to do UQ for.
%cat data.csv
Below is an example duqtools.template.yaml.
The index of each entry in data.csv file will be used as the run name (run.name).
The details for each entry in data.csv will be written to the template_data section.
Machine/dataset specific parameters, such as major radius or the start time are grabbed from the IDS.
For more information, see the documentation for large scale validation.
%cat duqtools.template.yaml
Running duqduq setup will generate a new directory for each dataset in data.csv. Each directory is in itself a valid duqtools directory.
!duqduq setup --yes --force
This is what the directory looks like after setup.
!tree .
It creates a duqtools config in each of the subdirectories. At this stage you could modify each of the duqtools.yaml if you wish. The config is no different than for a single UQ run. This means you could docd data_01 and treat it as a single UQ run.
%cat data_01/duqtools.yaml
Create runs using duqduq create
This is the equivalent of duqtools create, but for a large number of runs.
It will take each of the duqtools configs generated and set up the jetto runs and imas data according to the specification.
Since this will take a long time, we will use the --dry_run option.
!duqduq create --force --dry-run
Submit to slurm using duqduq submit
Use duqduq submit to submit the jobs to slurm. This tool will find all jobs (.llcmd files in the subdirectories) and submit them to slurm.
Use the --array option to submit the jobs as a slurm array.
os.chdir(duqtools_dir_done)
!duqduq submit --array --max_jobs 10 --force --dry-run
duqduq status
Query the status using duqduq status. This essentially parses all the jetto.status files in the run directory.
!duqduq status
Overview of LSV output directory
The output of duqduq differs from a single run in that there is an additional directory layer with the name of the data entry. The logs directory contains the slurm logs.
os.chdir(run_dir)
!tree -L 1
Each directory is a run directory as you know it from a single UQ run.
!tree 'data_01' -L 1
Merge data using duqduq merge.
os.chdir(duqtools_dir_done)
!duqduq merge --force --dry-run
Merged data are stored in in a local imasdb for each data entry in the run directory.
os.chdir(run_dir)
!tree 'data_01/imasdb'
Data exploration with duqtools dash
The imas handles for each merged data set are stored in merge_data.csv. They can be visualized using the duqtools dashboard.
os.chdir(duqtools_dir_done)
!duqtools dash