Skip to content

PDAF/2dmodel_pdaf

Repository files navigation

2dmodel_pdaf - advanced implementation of tutorial model with PDAF

Introduction

This repository provides an example of a data assimilation system with PDAF based on a simple two-field 2D model. It

  • demonstrates both online and offline coupling to PDAF.
  • supports all ensemble-based methods in PDAF (ensemble Kalman filters, particle filters, hybrid Kalman-nonlinear filter)
  • includes input data, run scripts, plotting utilities, and covariance-generation tools.
  • serves as a more advanced companion to the PDAF tutorial examples.

Quick start

To get the code, compile the online-coupled program and perform a first experiment:

  1. Use git clone --recurse-submodules https://github.com/PDAF/2dmodel_pdaf.git to get the tutorial together with the PDAF library
  2. Set compile settings: export PDAF_ARCH=linux_gfortran_openmpi (or a suitable (custom) configuration in pdaf/make.arch/)
  3. Build PDAF via make -C pdaf -j 4
  4. Set path to PDAF dirctory: export PDAFDIR=$(realpath pdaf)
  5. cd online
    1. Adapt settings in online/compile_settings.txt to your system
    2. Build the online coupled model via make model_pdaf
    3. Start an experiment via mpirun -np 6 ./model_pdaf -dim_ens 6

Note

PDAF needs libraries for BLAS, LAPACK and MPI; the program in online further needs a libraries for netCDF.

Important

Steps 1-5 are executed from the root directory of this repository; Steps i-iii are executed in the directory online/.

Tip

In the remainder of this README you find more details on the content of this repository, options for running experiments, and for understanding and plotting the output files. Also refer to the detailed description in this README if you run into any problems with this quick guide.

A data assimilation system with PDAF

Model overview

  • Two state variables: fieldA and fieldB
  • Domain: Two-dimensional rectangular in x- and y-direction
  • Dynamics: shift in positive y-direction
  • Parallelization: domain decomposition in x-direction using MPI
  • Simpler related variants are available in the PDAF tutorials: https://pdaf.awi.de/trac/wiki/PdafTutorial

Reference setup

  • Domain: 18x36 grid points
  • Initial condition: sine wave
  • Truth: initial condition rotated by 45 degrees
  • Ensemble initialization options:
    • Read ensemble files directly
    • Replace the ensemble mean using the mean state of 20 ensemble files
    • Use second-order exact sampling from a covariance matrix file
  • Example ensemble construction:
    • fieldA: rotated sine waves with varying wavelength
    • fieldB: shifts anaglous to the model dynamics

Implementation highlights

  • Structure: resemble larger applications such as FESOM or NEMO
  • State vector handling: flexible handling of multiple fields in the state vector
  • I/O: netCDF for model and observation data
  • File operations: flexible approach for model- and observation-specific information

Coupling modes

Online Offline
DA code extends the model build separate from the model
Makefile model compiles the PDAF user code
but PDAF library is linked as external libary
integrated with the PDAF library
Compilation definition independent shared/same with PDAF library

Note

  • The available input files allow running experiments with ensembles up to 20 states and up to 99 time steps.
  • The tutorial model is very simple and without real dynamics. Even though the model uses a 2-dimensional domain, it is practically 1-dimensional (just rotated by 45 degrees). This model allows DA with very small ensembles of at least 3 states. This is far lower than we would use in real cases, but it is useful to perform some experiments on a notebook computer. For real models, one would rather use an ensemble of at least 20 states.
  • The ensemble is generated in a systematic way (by rotating the field for the first model field and shifting for the second), which is artificial but helps to demonstrate some effects, e.g., of the localization.
  • The model does not read restart files. Thus, one cannot perform a realistic sequence of offline analysis steps. Only a single offline analysis step is done for demonstration.

Structure of the example codes

The directories containing the model and coupled model-PDAF codes are the following

  • 2dmodel/ - contains the pure tutorial model without coupling to PDAF
  • offline/ - contains the analysis program for offline-coupled assimilation with PDAF
  • online/ - contains the model with online-coupling to PDAF

There is a Makefile in each directory to compile the program. Each directory contains a subdirectory src/, which contains the source files. For online/, the source code is separated into directories src/, holding the model code, and src_pdaf/ containing the coupling code for PDAF.

Additional directories are

  • generate_covar/ - contains the code for a program generating a covariance matrix file from model outputs. This can be used to initialize an ensemble using second-order exact sampling (activated using '-type_ens_init 2' when running the DA cases).
  • inputs_2fields/ - contains the input files, e.g., observation files and files to provide ensemble states. It also contains the script pdaf_tutorial_online.py, which generates the input files.
  • plotting/ - contains plot scripts
  • runscripts/ - contains a selection of run scripts to run a series of experiments with and without DA
  • teamplate/ - contains template source files on which basis one can build the actual assimilation program in online or offline mode
  • test_online/ - contains the script 'run.sh' which shows an example of how to run the online-coupled case in separate directories

Compiling

Generally, the codes in the directory can be compiled outside of the PDAF directory. This is usually the case with real models, where the PDAF library is compiled separately and linked to the assimilative model in the online-coupled case.

Dependencies

Compiling the model itself only requires

  • make,
  • a Fortran compiler (e.g., gfortran),
  • an MPI implementation (e.g., OpenMPI),
  • and a netCDF library (libnetf, libnetcdff).

To compile PDAF and couple it to the model (online and offline), additionally

  • BLAS and LAPACK implementations (e.g., openblas) are required.

Compiling the model in 2dmodel/

The model is independent of PDAF. All compiler-specific settings are in compile_settings.txt. It is configured for a standard Linux installation, and should be adapted for other cases. To compile the program model use

make model

(Just executing make will show all compile targets.)

Compiling the online-coupled model in online/

The compilation of model_pdaf in the directory online/ is configured analogously to that in 2dmodel/. The Makefile is extended to compile the DA user codes for PDAF and to link the PDAF library to the program. The user code source files are located in src_pdaf and are included in the Makefile.

For compiling:

  1. Compile the PDAF library separately.
  2. Compile the program using
make <target> PDAFDIR=<mydir>

where

  • <target>: the name of the program you would like to compile (model_pdaf or model)
  • <mydir>: The path to the PDAF directory.

Possible names for <target> are shown when just running the command line without specifying <target>.

Compiling the offline-coupled DA program in offline/

The compilation of the offline-coupled program is integrated into the PDAF compilation scheme. Thus, there is no file compile_settings.txt, but the compile specifications are taken from a file in the directory make.arch of PDAF. The Makefile can also compile the PDAF library.

Compile the program offline_pdaf with

make <target> PDAF_ARCH=<arch> PDAFDIR=<mydir>

where

  • <target>: the name of the program you would like to compile (offline_pdaf)
  • <arch>: The name of the make.arch file without .h as in other compilations with PDAF (see files in make.arch/ of PDAF)
  • <mydir>: The path to the PDAF directory. Possible names for <target> are shown when just running the command line without specifying <target>.

Compiling the program to generate the covariance matrix file in generate_covar/

This program is compiled analogously to the offline_coupled DA program. Thus, you can compile the program with

make generate_covar PDAF_ARCH=<arch> PDAFDIR=<mydir>

with <arch> and <mydir> as before.

Overview of files

Input files

The input files are in inputs_2fields. These are

  • ensA_XX.nc and ensB_XX.nc: Ensemble files for both fields, where XX is the ensemble index ranging between 01 and 20. Used to initialize the ensemble for DA or to generate a covariance matrix file with generate_covar.
  • obsA.nc and obsB.nc: Observation fields for both fields containing all time steps. Read during DA process.
  • trueA_stepYY.nc and trueB_stepYY.nc: Files holding the true model state for both fields, where YY is the time step. Used to compare with to access quality of state estimate (e.g., using plot_diff.py)

Output files

Running the online or offline DA produces several output files. There are separate files for the first model field 'A' and the second model field 'B'. Below, we list only the files associated with field 'A'. For Field B, corresponding files are written.

  • stateA_step*_for.txt / stateA_step*_ana.txt: files holding the ensemble mean of field A of the model for the forecast (for) or analysis (ana).
  • ensA_XX_step*_for.txt / ensA_XX_step*_ana.txt: files holding each single ensemble member state (with index XX) for the forecast (for) or analysis (ana).
  • varianceA_step*_for.txt / varianceA_step*_ana.txt: files holding the ensemble variance of field A of the model for the forecast (for) or analysis (ana).

Plotting

The directory plotting/ contains Python plot scripts. They can be used like

plot_diff.py FILE1 FILE2      # Plot the difference of two files and display their RMS difference
plot_file.py FILE             # Plot the field in one file
plot_obs.py OBSFILE STEP      # Plot one observation file with OBSFILE being the observation file and STEP the time step to plot

The script plot_file.py can be used to plot files in /inputs_2fields, the model's output files, or those from a DA experiment. plot_diff.py is commonly used to plot the difference between the output of a DA experiment and the corresponding true state.

Running experiments

The programs for the online and offline coupled DA use command-line options to specify the experiment settings. All command-line options start with a dash. Thus, the syntax is

-VARIABLE VALUE

Running the offline coupled DA

The offline coupled DA performs one single analysis step. It uses a parallelization with domain decomposition. However, one can also run it with a single process.

To run the program, one can, e.g., use

./offline_pdaf -dim_ens <DIMENS> -filtertype <FTYPE> -step <STEP> [further options]

with

  • <DIMENS>: Ensemble size, e.g., between 3 and 20.
  • <FTYPE>: Type of filter algorithm, e.g., 6 for ESTKF or 7 LESTKF.
  • <STEP>: The time step for which observations are assimilated
  • [further options]: There are more options one can specify. We list some possibilities later.

Notes:

  • Depending on the MPI implementation you use, you might need to write 'mpirun -np 1' at the beginning of the command. With OpenMPI, one usually does not need this.
  • The program is implemented to always read the same input files. Thus, it does not support full offline DA cycling, in which one alternates between the DA step and the forecast. Consequently, in a more realistic case, one would change this so that one can specify at run time which files to read. Alternatively, one could add a script that copies restart files from the model to the default names.

Examples

Running the global filter ESTKF with ensemble size 10 at step 1:

./offline_pdaf -dim_ens 10 -filtertype 6 -step 1

Running the local filter LESTKF with ensemble size 6 at step 1 with localization cut-off radius 7.0

./offline_pdaf -dim_ens 6 -filtertype 7 -step 1 -cradius 7.0

Here, the additional option cradius is used.

Get an overview of available DA methods and the options of the selected DA method (<FTYPE>):

./offline_pdaf -filtertype <FTYPE> -subtype -1

If one omits the -filtertype option, the filter defaults to ESTKF.

Running the online coupled DA

The online coupled DA is implemented with PDAF's fully parallel mode. Thus, the number of processes has to be a multiple of the ensemble size.

mpirun -np <NUMPROCS> ./model_pdaf -dim_ens <DIMENS> -filtertype <FTYPE> [further options]

with

  • <DIMENS>: Ensemble size, e.g., between 3 and 20.
  • <FTYPE>: Type of filter algorithm, e.g., 6 for ESTKF or 7 LESTKF.
  • [further options]: There are more options one can specify. We list some possibilities later.

Examples

Running the local filter LESTKF using 6 processes with ensemble size 6:

mpirun -np 6./model_pdaf -dim_ens 6 -filtertype 6

Running the local filter LESTKF with parallelization using 12 processes with ensemble size 6

mpirun -np 12 ./model_pdaf -dim_ens 6 -filtertype 6

In this case, 2 processes per model are used, splitting the model domain into 2 subdomains. The outputs of this run should be the same as before. However, in the upper part of the screen output, you will see information about the domain decomposition, and the dimension of the fields in the state vector will reflect the decomposition.

To plot the difference of the final state estimate of this experiment with the true model state, use

python ../plotting/plot_diff.py stateA_step20_ana.txt ../inputs_2fields/trueA_step20.txt

Run Scripts

The directory runscripts/ contains run scripts to perform a series of experiments with the online coupled model:

  • run_free.sh: Run a series of ensemble runs without assimilating observations. The type of ensemble initialization and the ensemble size are varied.
  • run_global.sh: Run a series of DA experiments with the global ESTKF. The experiments vary the type of ensemble initialization, the inflation (forgetting factor), and the ensemble size.
  • run_local.sh: Run a series of DA experiments with the local filter LESTKF. The experiments vary the type of ensemble initialization, the inflation (forgetting factor), the ensemble size, and the localization radius.

Warning

Note that the run scripts perform a large number of experiments; e.g., run_global.sh runs 378 experiments, and run_local.sh runs 4998. Both run with an ensemble size of up to 20, so that up to 20 processes are used. Running these experiments is possible without having 20 physical processor cores, e.g., on a notebook computer with a sufficiently powerful processor. If the execution slows down significantly, the CPU is likely not powerful enough. Further, while each output file is small, the overall output of these experiments is several gigabytes.

Run options

Here we list a selection of possible options that can be set on the command line.

Option values Description Default
-dim_ens > 0 Ensemble size
-filtertype Type of DA method; see output from running with -subtype -1 6
-subtype subtype of DA method; see output from running with -subtype -1 0
-delt_obs > 0 forecast length in time steps 2
-forget > 0 'forgetting factor' controlling inflation: <1.0 inflates the ensemble spread; =1.0 neutral; >1.0 deflates the ensemble spread 1.0
-cradius $\geq$ 0.0 Localization cut-off radius 5.0
-locweight 0-4 Type of localiztion weight function: =0 constant weight of 1; =1 exponentially decreasing with SRADIUS; =2 use 5th-order polynomial; =3 regulated localization of $R$ with mean error variance; =4 regulated localization of $R$ with single-point error variance 2
-type_ens_init 1,2,3 Type of ensemble initialization: =1 Read ensemble files; =2 Read ensemble files and replace ensemble mean by mean state of 20 ensemble files; =3 Initialize from covariance matrix file and use by mean state of 20 ensemble files (need to run generate_covar first) 1
-twin_experiment [T|F] Whether to run a twin experiment, i.e., use observation values generated using PDAF functionality to generate synthetic observations F
-step 0-99 For offline DA: Time step of observations assimilated 1
-type_iau 0,1 Incremental analysis updating: 0= no IAU; 1= activate IAU 0
-steps_iau > 0 Incremental analysis updating: Number of time steps over which increment is distributed 1
-write_state [T|F] Whether to write ensemble mean state files T
-write_ens [T|F] Whether to write all ensemble states into files T
-write_var [T|F] Whether to write files for ensemble variance field F
-assim_A [T|F] Whether to assimilate observation of fieldA T
-assim_B [T|F] Whether to assimilate observation of fieldB F
-rms_obs_A > 0.0 Observation error for observations of fieldA 0.5
-rms_obs_B > 0.0 Observation error for observations of fieldB 0.25

Further options, including their explanation and possible values, can be seen in the file assimilation_pdaf_mod.F90.

Further functionality

Generating a covariance matrix file

The program for generating a covariance matrix file is located in the directory generate_covar/. The compilation of the program was described before.

The program can be run using

./generate_covar

It will read the ensemble files from inputs_2fields and perform a singular value decomposition on these files. Then it writes the file covar.nc, which contains the singular values, the corresponding singular values, and the ensemble mean state. The program is configured to use 20 ensemble files.

The directory generate_covar/ contains a README file with further information.

Generating synthetic observations for twin experiments

The program for the online coupled DA can generate files holding synthetic observations from a model forecast, and later use these files as inputs for twin experiments. As there are no real observations for this model, the observation files in inputs_2dfields/ are already synthetic. To this end, this functionality only serves to demonstrate how it is used.

The example is implemented so that, when run in parallel, separate files for each sub-domain are written. Accordingly, when performing the actual twin experiment (see below), one must use the same number of processes for each model. The example further assumes that observations are generated for each time step.

The example is further implemented so that synthetic observations can only be generated for one observation type at a time. Thus, to generate files for each of the two observation types in the example, one has to run the program twice. In each case, one performs the run with ensemble size 1. For running with parallelization using 2 sub-domains:

mpirun -np 2 ./model_pdaf -dim_ens 1 -filtertype 100 -delt_obs 1 -write_state F -write_ens F
mpirun -np 2 ./model_pdaf -dim_ens 1 -filtertype 100 -delt_obs 1 -write_state F -write_ens F -assim_A F -assim_B T

In the second run, observation type A is deactivated, while observation type B is switched on. These runs will generate 4 files, i.e., one file for each observation type and sub-domain:

obsA_syn_0000.nc, obsA_syn_0001.nc, obsB_syn_0000.nc, obsB_syn_0000.nc

Here, the 4-digit number is the process rank that wrote the file. The observations are written as a list of values. Information about observation locations or errors is initialized by the respective observation module during a twin experiment.

Note

The functionality to generate a synthetic observation is only implemented in the example for the online coupled case.

Performing a twin experiment

Once the files containing the synthetic observations have been generated, they can be used in twin experiments. In this case, each observation module reads the regular observation files to initialize all values. Then the file containing the synthetic observations is read, overwriting the values of the actual observations.

The twin experiments can assimilate any combination of observation types, as long as the necessary files containing the synthetic observations were generated beforehand.

To run the ESTKF with ensemble size 4, assimilating observations of type A, and using 2 processes per model to be consistent with the generated synthetic observation files, one can execute:

mpirun -np 8 ./model_pdaf -dim_ens 4 -filtertype 6 -delt_obs 2 -twin_experiment T

To run the analogous experiment assimilating synthetic observations of both types, one can execute:

mpirun -np 8 ./model_pdaf -dim_ens 4 -filtertype 6 -delt_obs 2 -twin_experiment T -assim_B T

Note

One can also perform twin experiments with the offline-coupled case, as long as one has previously generated synthetic observation files with the online-coupled program.

About

Example implementation of a simple model with PDAF showing the main features used with complex models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors