This repository provides an example of a data assimilation system with PDAF based on a simple two-field 2D model. It
- demonstrates both online and offline coupling to PDAF.
- supports all ensemble-based methods in PDAF (ensemble Kalman filters, particle filters, hybrid Kalman-nonlinear filter)
- includes input data, run scripts, plotting utilities, and covariance-generation tools.
- serves as a more advanced companion to the PDAF tutorial examples.
To get the code, compile the online-coupled program and perform a first experiment:
- Use
git clone --recurse-submodules https://github.com/PDAF/2dmodel_pdaf.gitto get the tutorial together with the PDAF library - Set compile settings:
export PDAF_ARCH=linux_gfortran_openmpi(or a suitable (custom) configuration inpdaf/make.arch/) - Build PDAF via
make -C pdaf -j 4 - Set path to PDAF dirctory:
export PDAFDIR=$(realpath pdaf) cd online- Adapt settings in
online/compile_settings.txtto your system - Build the online coupled model via
make model_pdaf - Start an experiment via
mpirun -np 6 ./model_pdaf -dim_ens 6
- Adapt settings in
Note
PDAF needs libraries for BLAS, LAPACK and MPI; the program in online further needs a libraries for netCDF.
Important
Steps 1-5 are executed from the root directory of this repository; Steps i-iii are executed in the directory online/.
Tip
In the remainder of this README you find more details on the content of this repository, options for running experiments, and for understanding and plotting the output files. Also refer to the detailed description in this README if you run into any problems with this quick guide.
- Two state variables:
fieldAandfieldB - Domain: Two-dimensional rectangular in x- and y-direction
- Dynamics: shift in positive y-direction
- Parallelization: domain decomposition in x-direction using MPI
- Simpler related variants are available in the PDAF tutorials: https://pdaf.awi.de/trac/wiki/PdafTutorial
- Domain: 18x36 grid points
- Initial condition: sine wave
- Truth: initial condition rotated by 45 degrees
- Ensemble initialization options:
- Read ensemble files directly
- Replace the ensemble mean using the mean state of 20 ensemble files
- Use second-order exact sampling from a covariance matrix file
- Example ensemble construction:
fieldA: rotated sine waves with varying wavelengthfieldB: shifts anaglous to the model dynamics
- Structure: resemble larger applications such as FESOM or NEMO
- State vector handling: flexible handling of multiple fields in the state vector
- I/O: netCDF for model and observation data
- File operations: flexible approach for model- and observation-specific information
| Online | Offline | |
|---|---|---|
| DA code | extends the model build | separate from the model |
| Makefile | model compiles the PDAF user code but PDAF library is linked as external libary |
integrated with the PDAF library |
| Compilation definition | independent | shared/same with PDAF library |
Note
- The available input files allow running experiments with ensembles up to 20 states and up to 99 time steps.
- The tutorial model is very simple and without real dynamics. Even though the model uses a 2-dimensional domain, it is practically 1-dimensional (just rotated by 45 degrees). This model allows DA with very small ensembles of at least 3 states. This is far lower than we would use in real cases, but it is useful to perform some experiments on a notebook computer. For real models, one would rather use an ensemble of at least 20 states.
- The ensemble is generated in a systematic way (by rotating the field for the first model field and shifting for the second), which is artificial but helps to demonstrate some effects, e.g., of the localization.
- The model does not read restart files. Thus, one cannot perform a realistic sequence of offline analysis steps. Only a single offline analysis step is done for demonstration.
The directories containing the model and coupled model-PDAF codes are the following
- 2dmodel/ - contains the pure tutorial model without coupling to PDAF
- offline/ - contains the analysis program for offline-coupled assimilation with PDAF
- online/ - contains the model with online-coupling to PDAF
There is a Makefile in each directory to compile the program. Each directory contains a subdirectory src/, which contains the source files. For online/, the source code is separated into directories src/, holding the model code, and src_pdaf/ containing the coupling code for PDAF.
Additional directories are
- generate_covar/ - contains the code for a program generating a covariance matrix file from model outputs. This can be used to initialize an ensemble using second-order exact sampling (activated using '-type_ens_init 2' when running the DA cases).
- inputs_2fields/ - contains the input files, e.g., observation files and files to provide ensemble states. It also contains the script
pdaf_tutorial_online.py, which generates the input files. - plotting/ - contains plot scripts
- runscripts/ - contains a selection of run scripts to run a series of experiments with and without DA
- teamplate/ - contains template source files on which basis one can build the actual assimilation program in online or offline mode
- test_online/ - contains the script 'run.sh' which shows an example of how to run the online-coupled case in separate directories
Generally, the codes in the directory can be compiled outside of the PDAF directory. This is usually the case with real models, where the PDAF library is compiled separately and linked to the assimilative model in the online-coupled case.
Compiling the model itself only requires
make,- a Fortran compiler (e.g., gfortran),
- an MPI implementation (e.g., OpenMPI),
- and a netCDF library (
libnetf,libnetcdff).
To compile PDAF and couple it to the model (online and offline), additionally
- BLAS and LAPACK implementations (e.g., openblas) are required.
The model is independent of PDAF. All compiler-specific settings are in compile_settings.txt. It is configured for a standard Linux installation, and should be adapted for other cases.
To compile the program model use
make model
(Just executing make will show all compile targets.)
The compilation of model_pdaf in the directory online/ is configured analogously to that in 2dmodel/. The Makefile is extended to compile the DA user codes for PDAF and to link the PDAF library to the program. The user code source files are located in src_pdaf and are included in the Makefile.
For compiling:
- Compile the PDAF library separately.
- Compile the program using
make <target> PDAFDIR=<mydir>
where
<target>: the name of the program you would like to compile (model_pdaformodel)<mydir>: The path to the PDAF directory.
Possible names for <target> are shown when just running the command line without specifying <target>.
The compilation of the offline-coupled program is integrated into the PDAF compilation scheme. Thus, there is no file compile_settings.txt, but the compile specifications are taken from a file in the directory make.arch of PDAF. The Makefile can also compile the PDAF library.
Compile the program offline_pdaf with
make <target> PDAF_ARCH=<arch> PDAFDIR=<mydir>
where
<target>: the name of the program you would like to compile (offline_pdaf)<arch>: The name of themake.archfile without.has in other compilations with PDAF (see files inmake.arch/of PDAF)<mydir>: The path to the PDAF directory. Possible names for<target>are shown when just running the command line without specifying<target>.
This program is compiled analogously to the offline_coupled DA program. Thus, you can compile the program with
make generate_covar PDAF_ARCH=<arch> PDAFDIR=<mydir>
with <arch> and <mydir> as before.
The input files are in inputs_2fields. These are
ensA_XX.ncandensB_XX.nc: Ensemble files for both fields, whereXXis the ensemble index ranging between 01 and 20. Used to initialize the ensemble for DA or to generate a covariance matrix file withgenerate_covar.obsA.ncandobsB.nc: Observation fields for both fields containing all time steps. Read during DA process.trueA_stepYY.ncandtrueB_stepYY.nc: Files holding the true model state for both fields, whereYYis the time step. Used to compare with to access quality of state estimate (e.g., usingplot_diff.py)
Running the online or offline DA produces several output files. There are separate files for the first model field 'A' and the second model field 'B'. Below, we list only the files associated with field 'A'. For Field B, corresponding files are written.
stateA_step*_for.txt / stateA_step*_ana.txt: files holding the ensemble mean of field A of the model for the forecast (for) or analysis (ana).ensA_XX_step*_for.txt / ensA_XX_step*_ana.txt: files holding each single ensemble member state (with indexXX) for the forecast (for) or analysis (ana).varianceA_step*_for.txt / varianceA_step*_ana.txt: files holding the ensemble variance of field A of the model for the forecast (for) or analysis (ana).
The directory plotting/ contains Python plot scripts. They can be used like
plot_diff.py FILE1 FILE2 # Plot the difference of two files and display their RMS difference
plot_file.py FILE # Plot the field in one file
plot_obs.py OBSFILE STEP # Plot one observation file with OBSFILE being the observation file and STEP the time step to plot
The script plot_file.py can be used to plot files in /inputs_2fields, the model's output files, or those from a DA experiment. plot_diff.py is commonly used to plot the difference between the output of a DA experiment and the corresponding true state.
The programs for the online and offline coupled DA use command-line options to specify the experiment settings. All command-line options start with a dash. Thus, the syntax is
-VARIABLE VALUE
The offline coupled DA performs one single analysis step. It uses a parallelization with domain decomposition. However, one can also run it with a single process.
To run the program, one can, e.g., use
./offline_pdaf -dim_ens <DIMENS> -filtertype <FTYPE> -step <STEP> [further options]
with
<DIMENS>: Ensemble size, e.g., between 3 and 20.<FTYPE>: Type of filter algorithm, e.g., 6 for ESTKF or 7 LESTKF.<STEP>: The time step for which observations are assimilated- [further options]: There are more options one can specify. We list some possibilities later.
Notes:
- Depending on the MPI implementation you use, you might need to write 'mpirun -np 1' at the beginning of the command. With OpenMPI, one usually does not need this.
- The program is implemented to always read the same input files. Thus, it does not support full offline DA cycling, in which one alternates between the DA step and the forecast. Consequently, in a more realistic case, one would change this so that one can specify at run time which files to read. Alternatively, one could add a script that copies restart files from the model to the default names.
Examples
Running the global filter ESTKF with ensemble size 10 at step 1:
./offline_pdaf -dim_ens 10 -filtertype 6 -step 1
Running the local filter LESTKF with ensemble size 6 at step 1 with localization cut-off radius 7.0
./offline_pdaf -dim_ens 6 -filtertype 7 -step 1 -cradius 7.0
Here, the additional option cradius is used.
Get an overview of available DA methods and the options of the selected DA method (<FTYPE>):
./offline_pdaf -filtertype <FTYPE> -subtype -1
If one omits the -filtertype option, the filter defaults to ESTKF.
The online coupled DA is implemented with PDAF's fully parallel mode. Thus, the number of processes has to be a multiple of the ensemble size.
mpirun -np <NUMPROCS> ./model_pdaf -dim_ens <DIMENS> -filtertype <FTYPE> [further options]
with
<DIMENS>: Ensemble size, e.g., between 3 and 20.<FTYPE>: Type of filter algorithm, e.g., 6 for ESTKF or 7 LESTKF.- [further options]: There are more options one can specify. We list some possibilities later.
Examples
Running the local filter LESTKF using 6 processes with ensemble size 6:
mpirun -np 6./model_pdaf -dim_ens 6 -filtertype 6
Running the local filter LESTKF with parallelization using 12 processes with ensemble size 6
mpirun -np 12 ./model_pdaf -dim_ens 6 -filtertype 6
In this case, 2 processes per model are used, splitting the model domain into 2 subdomains. The outputs of this run should be the same as before. However, in the upper part of the screen output, you will see information about the domain decomposition, and the dimension of the fields in the state vector will reflect the decomposition.
To plot the difference of the final state estimate of this experiment with the true model state, use
python ../plotting/plot_diff.py stateA_step20_ana.txt ../inputs_2fields/trueA_step20.txt
Run Scripts
The directory runscripts/ contains run scripts to perform a series of experiments with the online coupled model:
run_free.sh: Run a series of ensemble runs without assimilating observations. The type of ensemble initialization and the ensemble size are varied.run_global.sh: Run a series of DA experiments with the global ESTKF. The experiments vary the type of ensemble initialization, the inflation (forgetting factor), and the ensemble size.run_local.sh: Run a series of DA experiments with the local filter LESTKF. The experiments vary the type of ensemble initialization, the inflation (forgetting factor), the ensemble size, and the localization radius.
Warning
Note that the run scripts perform a large number of experiments; e.g., run_global.sh runs 378 experiments, and run_local.sh runs 4998. Both run with an ensemble size of up to 20, so that up to 20 processes are used. Running these experiments is possible without having 20 physical processor cores, e.g., on a notebook computer with a sufficiently powerful processor. If the execution slows down significantly, the CPU is likely not powerful enough. Further, while each output file is small, the overall output of these experiments is several gigabytes.
Here we list a selection of possible options that can be set on the command line.
| Option | values | Description | Default |
|---|---|---|---|
-dim_ens |
> 0 | Ensemble size | |
-filtertype |
Type of DA method; see output from running with -subtype -1
|
6 | |
-subtype |
subtype of DA method; see output from running with -subtype -1
|
0 | |
-delt_obs |
> 0 | forecast length in time steps | 2 |
-forget |
> 0 | 'forgetting factor' controlling inflation: <1.0 inflates the ensemble spread; =1.0 neutral; >1.0 deflates the ensemble spread | 1.0 |
-cradius |
|
Localization cut-off radius | 5.0 |
-locweight |
0-4 | Type of localiztion weight function: =0 constant weight of 1; =1 exponentially decreasing with SRADIUS; =2 use 5th-order polynomial; =3 regulated localization of |
2 |
-type_ens_init |
1,2,3 | Type of ensemble initialization: =1 Read ensemble files; =2 Read ensemble files and replace ensemble mean by mean state of 20 ensemble files; =3 Initialize from covariance matrix file and use by mean state of 20 ensemble files (need to run generate_covar first) | 1 |
-twin_experiment |
[T|F] | Whether to run a twin experiment, i.e., use observation values generated using PDAF functionality to generate synthetic observations | F |
-step |
0-99 | For offline DA: Time step of observations assimilated | 1 |
-type_iau |
0,1 | Incremental analysis updating: 0= no IAU; 1= activate IAU | 0 |
-steps_iau |
> 0 | Incremental analysis updating: Number of time steps over which increment is distributed | 1 |
-write_state |
[T|F] | Whether to write ensemble mean state files | T |
-write_ens |
[T|F] | Whether to write all ensemble states into files | T |
-write_var |
[T|F] | Whether to write files for ensemble variance field | F |
-assim_A |
[T|F] | Whether to assimilate observation of fieldA | T |
-assim_B |
[T|F] | Whether to assimilate observation of fieldB | F |
-rms_obs_A |
> 0.0 | Observation error for observations of fieldA | 0.5 |
-rms_obs_B |
> 0.0 | Observation error for observations of fieldB | 0.25 |
Further options, including their explanation and possible values, can be seen in the file assimilation_pdaf_mod.F90.
The program for generating a covariance matrix file is located in the directory generate_covar/. The compilation of the program was described before.
The program can be run using
./generate_covar
It will read the ensemble files from inputs_2fields and perform a singular value decomposition on these files. Then it writes the file covar.nc, which contains the singular values, the corresponding singular values, and the ensemble mean state. The program is configured to use 20 ensemble files.
The directory generate_covar/ contains a README file with further information.
The program for the online coupled DA can generate files holding synthetic observations from a model forecast, and later use these files as inputs for twin experiments.
As there are no real observations for this model, the observation files in inputs_2dfields/ are already synthetic. To this end, this functionality only serves to demonstrate how it is used.
The example is implemented so that, when run in parallel, separate files for each sub-domain are written. Accordingly, when performing the actual twin experiment (see below), one must use the same number of processes for each model. The example further assumes that observations are generated for each time step.
The example is further implemented so that synthetic observations can only be generated for one observation type at a time. Thus, to generate files for each of the two observation types in the example, one has to run the program twice. In each case, one performs the run with ensemble size 1. For running with parallelization using 2 sub-domains:
mpirun -np 2 ./model_pdaf -dim_ens 1 -filtertype 100 -delt_obs 1 -write_state F -write_ens F
mpirun -np 2 ./model_pdaf -dim_ens 1 -filtertype 100 -delt_obs 1 -write_state F -write_ens F -assim_A F -assim_B T
In the second run, observation type A is deactivated, while observation type B is switched on. These runs will generate 4 files, i.e., one file for each observation type and sub-domain:
obsA_syn_0000.nc, obsA_syn_0001.nc, obsB_syn_0000.nc, obsB_syn_0000.nc
Here, the 4-digit number is the process rank that wrote the file. The observations are written as a list of values. Information about observation locations or errors is initialized by the respective observation module during a twin experiment.
Note
The functionality to generate a synthetic observation is only implemented in the example for the online coupled case.
Once the files containing the synthetic observations have been generated, they can be used in twin experiments. In this case, each observation module reads the regular observation files to initialize all values. Then the file containing the synthetic observations is read, overwriting the values of the actual observations.
The twin experiments can assimilate any combination of observation types, as long as the necessary files containing the synthetic observations were generated beforehand.
To run the ESTKF with ensemble size 4, assimilating observations of type A, and using 2 processes per model to be consistent with the generated synthetic observation files, one can execute:
mpirun -np 8 ./model_pdaf -dim_ens 4 -filtertype 6 -delt_obs 2 -twin_experiment T
To run the analogous experiment assimilating synthetic observations of both types, one can execute:
mpirun -np 8 ./model_pdaf -dim_ens 4 -filtertype 6 -delt_obs 2 -twin_experiment T -assim_B T
Note
One can also perform twin experiments with the offline-coupled case, as long as one has previously generated synthetic observation files with the online-coupled program.