.. _usage-guide: ========================== A brief usage guide to kcs ========================== Preliminaries ============= This suite (package) of Python modules contains functionality to perform the first steps for calculating the KNMI Climate Scenarios. The KNMI Climate Scenarios provide a set of scenarios for the Netherlands in 2050 and 2085 (middle and end of the 21st century). These are calculated using a regional model, and the latter is adjusted to match the global climate change, using global models. This last step is provided in this package. The package is called ``kcs``, abbreviated for KNMI Climate Scenarios. Some of the example commands given are quite long, because of the use of several options. For readability, these lines are broken across multiple lines, with a backslash at the end of the lines that need to be continued. In most terminals and shells, you can safely copy-paste such commands with the newlines and backslashes and run them, but you may have to remove the backslash and newlines first. So the following two commands are the same:: conda create --name kcs iris pandas h5py toml python=3.7 --channel conda-forge conda create --name kcs --channel conda-forge \ iris pandas h5py toml python=3.7 Installation ------------ For a more detailed installation guide, see the ``INSTALL`` document in the root directory of the package. Installation requires a few Python packages: ``SciTools-Iris``, ``Pandas`` and ``h5py`` and ``toml``. The easiest way is to use Conda to install these. If you don't have Conda, you can obtain the Miniconda installer from https://docs.conda.io/en/latest/miniconda.html ; follow the installation for Miniconda, for Python 3. With Conda, create a new environment with the required packages:: conda create --name kcs iris pandas h5py toml python=3.7 --channel conda-forge You can pick another name, and you may want to change to a newer Python version in the future, provided it's supported by the various packages. (KCS supports Python 3.8, but not all dependencies do as of March 2020.) Once set up, activate the new environment, and install KCS manually from its repository:: conda activate kcs pip install git+https://github.com/eucp-project/kcs.git Running modules --------------- KCS is set up as a set of runnable modules. These take the place of scripts, and should be thought of as such (in fact, they could be installed as scripts, but for clarity, they are kept inside the KCS package as runnable modules). The way to run such a module is as follows:: python -m kcs. [ ...] Note the ``-m`` flag after ``python``, which indicates that the following argument is a module. This also keeps the Python executable together with the package. This `-m` flag is an option to the ``python`` executable, while any flags following ``kcs.`` belong to that runnable module (technically, it is therefore possible to have two different ``-m`` options on a single line). All runnable modules have a `-h` or `--help` option, to get a quick help; and a`-v` option for verbosity: used once, it will log warnings (fairly useful). ``-vv`` will produce some "info" log messages as well, and ``-vvv`` (the maximum level) will produce quite a number of debug messages along the way. @-lists ------- Several of the modules take a list of files as input. These can be given as one or more standard shell globbing patterns (for example, ``/mnt/cmip5/rcp85/Amon/tas/*/*/tas*.nc``), but another option is to use a so-called at-list (@-list): this is a file, containing a list of filenames, one per line. When this file is given as input, but prepended with a ``@`` sign, the file will be interpreted by the KCS package as a list of files given inside. E.g., a file ``files.list`` that contains the following:: tas_Amon_GISS-E2-R_rcp85_r1i1p1_200601-202512.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_202601-205012.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_205101-207512.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_207601-210012.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_210101-212512.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_212601-215012.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_215101-217512.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_217601-220012.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_220101-222512.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_222601-225012.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_225101-227512.nc tas_Amon_GISS-E2-R_rcp85_r1i1p1_227601-230012.nc and it's used as follows:: python -m kcs.read_files @files.list this is interpreted as: .. code-block:: bash python -m kcs.read_files tas_Amon_GISS-E2-R_rcp85_r1i1p1_200601-202512.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_202601-205012.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_205101-207512.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_207601-210012.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_210101-212512.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_212601-215012.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_215101-217512.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_217601-220012.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_220101-222512.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_222601-225012.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_225101-227512.nc \ tas_Amon_GISS-E2-R_rcp85_r1i1p1_227601-230012.nc NB: ``kcs.read_files`` is a fictional module, so the above example will yield an error. Lines in an @-list can be commented-out using a ``#`` at the start of a line (but trailing comments at the end of a line are not allowed; these are interpreted as part of the file name). @-lists can also be nested. This is convenient if you have a set of sub-lists, and want to group these together. For example, a ``tas-all.list`` could contain:: @tas-historical.list #@tas-rcp26.list @tas-rcp45.list @tas-rcp60.list @tas-rcp85.list where the RCP 2.6 files are (temporarily) commented out. Note that file names inside an @-list are relative to the current working directory, if they are relative; using absolute paths may be safest option, although this may of course cause lenghty lines. Adjusting code -------------- Sometimes, you may want to change some code in the ``kcs`` package. There are a few ways to do this. The most standard way is to clone the repository, and perform an "editable" installation. Do this before you install the KCS package with ``pip`` as shown above, or uninstall it first (``pip uninstall kcs``). Clone the repository (in a directory where you want the actual code to stay): .. code-block:: bash git clone https::/github.com/eucp-project/kcs.git cd kcs Then install an "editable" version (make sure you are in the activated Conda environment):: pip install -e . Now, any change you make to the files in the cloned repository, will be automatically reflected in the installed Python package. You don't, however, need to install the package (or may not want to). The code in the package can be run directly from within its directory; there is no need to compile anything beforehand. Thus, running all the examples after ``cd kcs`` but without ``pip install -e .`` will also work. The disadvantage is that you can't run it from any other directory, so all output files will end up in that directory as well. Steps ===== The following steps perform all the routines for producing the initial climate scenario setup. The details can be found in Lenderink et al, 2014, and the corresponding appendix. Step 0: area extraction and averaging ------------------------------------- Step 0 involves the extraction and averaging of areas. The areas for the specific KNMI Climate Scenarios are a global average (for the air temperature, ``tas``), and a point "average" for the air temperature ``tas`` and precipitation ``pr``. This is to be done for both the available CMIP data, and the model of interest that will be used to downscale the regional module. This step is numbered 0, since the user may have other ways to obtain the same result. The end result should be a set of one-dimensional datasets containing area-averaged time-series for relevant variable, in a NetCDF file following the CF conventions, grouped in separate directories by area and variable. These can then be input to step 1. For the global CMIP ``tas`` average, run the following command: .. code-block:: bash python -m kcs.extraction --area global @cmip-tas.list --ignore-common-warnings -v Iris can be quite chatty regarding potential problems: the `--ignore-common-warnings` option turns this chattiness down, but it may be worth to leave this option out a first time, to see that the notices are indeed not really a problem. Note that optional arguments can be put before or after the mandatory arguments (``@cmip-tas.list`` is the only required argument here). As mentioned earlier, the ``-m`` is an option to ``python``, not to ``kcs.extraction``. The ``global`` area is predefined; there are also ``nlpoint``, ``nlbox``, ``rhinebasin`` and ``weurbox`` areas. For their definitions, see the ``kcs/config/default.py`` file. You could add more definitions there, but it's better to provide your own configuration file for doing that. All areas are averaged using a weighted-average, except for the single point area (``nlpoint``): this uses a standard linear interpolation (as used in ``iris.cube.Cube.interpolate``). For the extraction and averaging, all datasets are handled separately: there is no matching between historical and future data (as is done in later steps below), since this is not needed. The output is written to a set of files in a subdirectory that is named following the variable and area: ``data/--averaged/.nc`` (```` is the input filename of an individal dataset file, without its extension). You can change this using the ``--template`` option, with a Python-formatting like string. In the example below, we extract the ``tas`` data for our model of interest (EC-EARTH), and save these results into a separate directory: .. code-block:: bash python -m kcs.extraction --area global @ecearth-tas.list \ --template "data/ecearth/{var}-{area}-averaged/{filename}.nc" \ --ignore-common-warnings -v Another example, would be if you want separate directories for e.g. CMIP5 and CMIP6 data: .. code-block:: bash python -m kcs.extraction --area global @cmip5-tas.list \ --template "data/cmip5/{var}-{area}-averaged/{filename}.nc" \ --ignore-common-warnings -v python -m kcs.extraction --area global @cmip6-tas.list \ --template "data/cmip6/{var}-{area}-averaged/{filename}.nc" \ --ignore-common-warnings -v The examples below perform the extraction for the ``nlpoint`` area, for both ``tas`` and ``pr``, and for both the CMIP data and the EC-EARTH ("model of interest") data: .. code-block:: bash python -m kcs.extraction --area nlpoint @cmip-tas.list \ --ignore-common-warnings -v python -m kcs.extraction --area nlpoint @cmip-pr.list \ --ignore-common-warnings -v python -m kcs.extraction --area nlpoint @ecearth-tas.list \ --template "data/ecearth/{var}-{area}-averaged/{filename}.nc" \ --ignore-common-warnings -v python -m kcs.extraction --area nlpoint @ecearth-pr.list \ --template "data/ecearth/{var}-{area}-averaged/{filename}.nc" \ --ignore-common-warnings -v For non-global and non-point areas, there is a ``--regrid`` option, which will regrid the data to a common one by one grid before extraction; this should ensure the same area is extracted, since Iris does not interpolate grid points when performing area extraction. If you want to change the grid to regrid to, you can change the function ``create_grid`` in ``kcs/utils/coord.py``. The end result of step 0 should be six subdirectories: three for extracted CMIP data, and three for te model of interest. These three directories are a global ``tas`` directory, an nlpoint ``tas`` directory and an nlpoint ``pr`` directory. Step 1a: global tas change -------------------------- This step simply calculates the global temperature change (historical and future scenarios), averaging all available model runs, normalised to a reference (control) period. Again, the examples use an @-list. These list contain the area-averaged data from the previous step; the filename indicates the datasets involved. .. code-block:: bash python -m kcs.tas_change @cmip-tas-global-averaged.list \ --outfile=tas_change_cmip.csv --on-no-match=randomrun -v \ --norm-by=run --reference-period 1991 2020 Notes on the options: * ``--outfile``: the output CSV file. This contains the percentiles and mean of the normalised ``tas`` value for each year. The statistics are calculated across all individual model runs. This option is required (but given as an option, for clarity). * ``--on-no-match``: if a future experiment run can't be matched with a historical experiment run, an attempt is made to pick another, random, historical run from the same model. The matches are usually made using the attributes of the dataset, in particular the ``parent_experiment_rip`` attribute, and otherwise an attempt is made to match the ``rip`` parameters themselves. Other values are ``error`` (the script will exit with an error), ``remove`` will remove the experiment run, ``random`` is more broad than ``randomrun`` and will ignore the initialization and physics parameters when picking a random match. * ``--norm-by``: normalise the runs per run, or per experiment, or even per model. These options change from a "tight" normalisation to a very "broad" normalisation. * ``--reference-period``: which period to normalise the (matched) runs to. The default is 1981 -- 2010, which is the reference period used with CMIP5 data in Lenderink et al, 2014. The example above has a reference period used for CMIP6 data. Note that years are inclusive, and run from January 1 to December 12, thus each reference period is exactly 30 years. The ``--on-no-match`` and ``--norm-by`` options already have their default set to the given values (``randomrun`` and ``run``, respectively), so there is really no need to give them; that has been done here for the explanation. The ``--reference-period`` also has a default setting, but you may want to make this explicit, if you are switching between different input datasets. Thus: .. code-block:: bash python -m kcs.tas_change @cmip-tas-global-averaged.list \ --outfile=tas_change_cmip.csv --reference-period 1991 2020 does the same thing, and is easier to read. Don't forget the ``-v`` (or ``-vv``, or ``-vv```) generic option, to get some logging information. The output is a CSV file, which looks somewhat as follows:: date,mean,5,10,25,50,75,90,95 1950-01-01,-0.740,-0.902,-0.893,-0.861,-0.816,-0.657,-0.432,-0.429 1951-01-01,-0.754,-1.089,-1.080,-0.911,-0.820,-0.633,-0.259,-0.256 1952-01-01,-0.806,-1.098,-1.089,-0.928,-0.839,-0.624,-0.437,-0.434 1953-01-01,-0.806,-1.099,-1.094,-1.045,-0.723,-0.706,-0.403,-0.400 1954-01-01,-0.765,-1.099,-1.094,-1.069,-0.640,-0.569,-0.364,-0.361 .... (Numbers are truncated to just three decimal digits for display purposes.) Thus, the columns contain the 5, 10, etc percentiles of the (CMIP) distribution, for each year, as well as the mean. This CSV file is input for the plot below, and for step 1b. Plot the tas change ~~~~~~~~~~~~~~~~~~~ To create a plot of the temperature change, use the following command: .. code-block:: bash python -m kcs.tas_change.plot tas_change_cmip.csv cmip.png \ --xrange 1950 2100 --ylabel 'Temperature change [${}^{\circ}$C]' \ --title 'Global annual temperature change' --smooth 7 --yrange -1 6 \ --grid --legend The module has two required arguments: the CSV file calculated above, and an output figure file name (its extension will determine the file type automatically). The meaning of most options will be evident. It is possible to use some LaTeX in the various label and title arguments (see the Matplotlib documentation for details). The ``--smooth`` parameters calculates a rolling window average over the data, and should be an integer. In the above example, a rolling average is calculated with a seven-year window. Step 1b: matching the model of interest with the CMIP tas change ---------------------------------------------------------------- This step takes the result of step 1a, and matches the global CMIP ``tas`` change with the global ``tas`` change of our model of interest, for relevant epochs. The user picks one or more epochs, and CMIP percentiles, and the procedure will match the CMIP change in ``tas`` with an identical change in ``tas`` for the model of interest, which results in a specific year, calculated over a 30-year period. These define the scenarios: high and low temperature change (e.g., 90 and 10 percentile CMIP change) for middle and end of centeury (2050 and 2085; 2085, because the 30-year period average ranges from 2070 to 2099, the end of the CMIP data). The output is a CSV file, which contains the so-called steering table. This table contains the matching period in our model of interest, the actual temperature change (with respect to the reference period), and a possible correction factor, in case the model of interest can't match the CMIP temperature change in the CMIP time range (for example, the EC-EARTH model can't match the 90 percentile temperature change for the 2070-2099 period: it is too cool for that, and doesn't run beyond 2100 to allow it to increase its temperature). Note that individual runs in the model of interest are averaged. These should, therefore, be runs of the same experiment, and preferably just be different realizations of the same model-experiment. .. code-block:: bash $ python -m kcs.steering tas_change_cmip.csv @ecearth-tas-global-averaged.list \ --scenario G 2050 10 --scenario W 2050 90 --scenario G 2085 10 --scenario W 2085 90 \ --rolling-mean 10 --outfile steering.csv The module takes two mandatory input files: the CMIP CSV file with the tas change computed previously, and a list of globally-averaged tas data of the model of interest, EC-EARTH in our example. The ``--scenario`` options set the various scenarios of interest. The option can be repeated, and takes three values: a name, an epoch and a percentile. Be aware that the percentile to match should also be present in the ``tas_change_cmip.csv``. Here, we also calculated a rolling mean over the CMIP input data before the EC-EARTH data are matched, to smooth out any bumps in the distribution. Finally, the output steering table is written to ``steering.csv`` with the ``--outfile`` option (otherwise it will only be printed to the standard output), and looks as follows:: name,epoch,percentile,cmip_delta_t,period,model_delta_t,factor G,2050,10,1.078,"(2021, 2050)",1.104,0.976 W,2050,90,2.276,"(2049, 2078)",2.289,0.994 G,2085,10,1.327,"(2027, 2056)",1.330,0.998 W,2085,90,4.662,"(2070, 2099)",3.290,1.417 (Numbers are truncated to just three decimal digits for display purposes.) There are ``cmip_delta_t`` and and ``model_delta_t`` columns. The first gives the change in global ``tas`` between the reference period and the epoch period (the 30-year period around the epoch), the second gives the change in global ``tas`` between the reference period and the ``period`` for the model of interest. These are usually near-identical (they will not be exactly the same, since dates are rounded to years), which shows in the ``factor`` column being close to one. Notice, however, how they are quite different in the last row: this is because the model of interest reached the end of the valid period for future experiments, the year 2099. As a result, the correction factor is significantly different from ``1.0``, ``1.42`` here. Plot the model of interest over the CMIP data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To plot the CMIP data (as above) with the model of interest, and indicate the values for which the scenarios are calculated (that is, the epoch-percentile points), you can use the following command: .. code-block:: bash python -m kcs.steering.plot tas_change_cmip.csv steering.csv \ --outfile cmip-ecearth-scenarios.png \ --extra-data @ecearth-tas-global.list --reference-epoch 2005 \ --ylabel 'Temperature increase [${}^{\circ}$]' --smooth 10 \ --grid --legend This command has three mandatory arguments (one of them given as an option): * the table with the tas change percentiles versus years (as before), in CSV format. * the steering table, in CSV format. * ``--outfile``: an output figure file name (the extension will automatically determines the file type). The ``--smooth`` option is as before for other commands: it applies a rolling average, in this case with a window of 10 (years). The ``-reference-epoch `` will plot a marker for the reference epoch, which is taken to be the centre of the reference period used before. Step 2: calculating the variable changes for CMIP ------------------------------------------------- This step calculates the changes in the CMIP data between a reference period and period of interest (the usual 2050 and 2085 epochs, or 2035 -- 2064 and 2070 - 2099 periods). This is done for several seasons and variables: the examples below calculate it for winter and summer, and for ``tas`` and ``pr``. The area under consideration is a local area, in this particular case a single point in the south-west of the Netherlands (``nlpoint``): this step looks at the local changes, in contrast to the global temperature change above. The following four commands calculate the changes: .. code-block:: bash python -m kcs.change_perc @cmip-pr-nlpoint-averaged.list --season djf \ --period 2035 2064 --relative --csvfile pr_change_2050_djf_nlpoint.csv -v python -m kcs.change_perc @cmip-pr-nlpoint-averaged.list --season jja \ --period 2035 2064 --relative --csvfile pr_change_2050_jja_nlpoint.csv -v python -m kcs.change_perc @cmip-tas-nlpoint-averaged.list --season djf \ --period 2035 2064 --csvfile tas_change_2050_djf_nlpoint.csv -v python -m kcs.change_perc @cmip-tas-nlpoint-averaged.list --season jja \ --period 2035 2064 --csvfile tas_change_2050_jja_nlpoint.csv -v (The 2085 scenarios are identical except for the ``--period 2035 2064`` option.) Notes: * The commands can only function one season, and one variable, at a time. To speed things up, one can run these commands in parallel, simply putting them in the background in the shell when run. * There is a mandatory input list of files, but the ``--season`` and ``--period`` options are also required. * The ``pr`` variable requires a ``--relatve`` flag, to indicate that we want to calculate the *relative* change in precipitation (for ``tas``, we calculate the absolute change). * The ``--csvfile`` option will write a CSV file, which can be used as input for plotting the changes later on. An example file is given below. * As with the global ``tas`` calculation, historical and future experiments are matched first. The default settings should be fine, but there are a few options that allow changing how this is done, as for ``kcs.tas_change``. Please use the ``--help`` to examine those if felt necessary. The output of the calculation is a table that contains the distribution of the changes in the CMIP distribution. As a result, the CSV file contains both a mean and percentiles along both axes:: ,mean,5,10,25,50,75,90,95 mean,-5.672,-25.525,-21.824,-13.071,-5.191,2.562,8.720,13.544 5,-15.337,-66.890,-50.293,-31.471,-13.756,0.804,15.039,23.959 10,-14.055,-54.061,-44.544,-28.308,-11.878,1.369,12.420,18.332 25,-10.790,-44.817,-36.995,-22.955,-7.886,0.213,12.677,16.739 50,-7.552,-35.312,-28.169,-16.606,-7.088,3.231,9.817,14.888 75,-3.874,-21.315,-18.101,-11.165,-3.317,2.962,8.665,15.999 90,-1.079,-17.224,-12.760,-7.146,-1.547,5.953,11.574,14.656 95,-0.421,-17.170,-14.149,-7.606,-0.380,7.175,14.238,18.228 (Numbers are truncated to just three decimal digits for display purposes.) Plotting the CMIP changes ~~~~~~~~~~~~~~~~~~~~~~~~~ The above output files can be plotted with the following command: .. code-block:: bash python -m kcs.change_perc.plot pr_change_2050_jja_nlpoint.csv pr_change_2050_jja_nlpoint.png \ --epoch 2050 --text 'precip, JJA', --ytitle 'Change (%)' --ylimits -60 45 There are two required arguments: the CSV input file, and the figure output file. The other options are nearly all for annotations (``--epoch`` prints an epoch indicator on the graph) or the axes limits. Calculating and overplotting the model of interest ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The same calculation can be applied for the model of interest. In the example case, the data were already matched and concatenated between historical and future experiments; this is why there is a ``--no-matching`` option given. .. code-block:: bash python -m kcs.change_perc @ecearth-pr-nlpoint-averaged.list --season jja \ --period 2049 2078 --relative --run-changes pr_change_2050W_jja_nlpoint_ecearth.csv \ --no-matching There are a few notable differences here compared to the calculation for the CMIP data: * The ``--no-matching`` option has been explained above * The ``--period`` is taken from the steering table, since we are dealing with the matched EC-EARTH data in this case. * The ``--csvfile`` option is missing: we are not writing the distribution of the (CMIP) data distribution. * Instead, we write the (percentile) changes for each individual run to a file, with the ``--run-changes`` option. The resulting ``pr_change_2050W_jja_nlpoint_ecearth.csv`` looks as follows:: ensemble,ref-mean,fut-mean,ref-10,fut-10,ref-50,fut-50,ref-90,fut-90,mean,10,50,90 r1i1p1,3.679-05,3.209-05,1.966-05,1.498-05,3.472-05,3.097-05,5.298-05,5.019-05,-12.794,-23.809,-10.812,-5.279 r2i1p1,3.541-05,3.271-05,1.976-05,1.786-05,3.491-05,3.132-05,5.316-05,4.794-05,-7.623,-9.616,-10.269,-9.816 r3i1p1,3.611-05,3.282-05,2.162-05,1.297-05,3.526-05,3.011-05,5.397-05,5.524-05,-9.102,-40.012,-14.606,2.354 ... (Numbers are truncated as usual for display purposes. The number of displayed rows and columns is also limited: not all default percentiles are shown, just 10, 50 and 90.) For each realization, there is a row. The rows contains the percentiles for the reference (control) period, the future period (2049 -- 2078 here), and their differences (the latter columns are simply called ``mean``, ``5``, ``10``, ...). It is the latter we are interested in. Given the above output, we can replot the CMIP distribution, but now overplot the individual EC-EARTH runs: .. code-block:: bash python -m kcs.change_perc.plot pr_change_2050_jja_nlpoint.csv pr_change_2050_jja_nlpoint.png \ --epoch 2050 --text 'precip, DJF', --ytitle 'Change (%)' --ylimits -60 45 \ --scenario-run G pr_change_G2050_jja_nlpoint_ecearth.csv \ --scenario-run W pr_change_W2050_jja_nlpoint_ecearth.csv A full steering table at a time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The code above requires one to read the steering table, and extract the relevant period for each scenario. It is easier, and safer (less manual errors) to automatically do this, and have a script run the above calculation for each scenarion in a steering table. A simple script exists, simply called ``kcs.change_perc.runall``, that takes the steering table instead of a period. It also conveniently calculates the percentiles for our model of interest, and creates the relevant output files (the so-called ``run-changes`` for our model of interest, detailing the percentiles for each run individually, and the distribution of percentiles for the CMIP "background" data). The output files are named automatically, deduced from the scenarion name, epoch, variable and season. It also creates the plot, with the model of interest overplotted. Since the plot names are also auto-generated, you can only specify the plot type: ``pdf`` or ``png``. An obvious disadvantage is that individual specifications of a plot can't be set, such as the y-range (``--ylimits``). All in all, this looks like: .. code-block:: bash python -m kcs.change_perc.runall @cmip-tas-nlpoint-averaged.list \ --season djf --steering steering.csv --runs @ecearth-tas-nlpoint-averaged.list \ --relative pr --no-matching --plottype pdf --write-csv Note that options like ``--no-matching`` are still provided, and apply *only* to the model of interest, that is, the EC-EARTH data here. For a full (parallel) run of all variables, epochs and seasons, see the ``extras/percentile-all.bash`` script, which is a bash script wrapped around the above command. Step 3: resample the runs of the model of interest -------------------------------------------------- This step forms the core of the kcs module: it resamples the input runs (realizations) of the model of interest (EC-EARTH in our case), and resamples these in three stages, to try and match them with the CMIP distribution for ``tas`` and ``pr`` calculated above. Note that the CMIP data is not actually input: the matching is eye-balled by overplotting the resampled data. At the moment, the three stages are all calculated in one function, and are unfortunately not separable. There are, of course, options are available for each individual stage to set parameters. Below, the individual stages are described. Below that, the command is given to run one resample. Stage 1: match prescribed precipitation scenarios ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The four scenarios in the steering table (a warm "W" and moderate "G" (gematigd) scenario, for two epochs) is extended by adding precipitation requirements, in the form of percentage increase per degree of global temperature increase. In this particular case, we chose 4% and 8% increase per degree. This double the steering table, which has now obtained an extra column with the actual (temperature times percentage) precipitation. The actual input runs are resampled by dividing the relevant period (from the steering table, for a specific scenario) into five-year intervals, which gives 6 segments per run, across 16 runs in our specific case. This yields :math:`16^6` possible combinations. This is done for both the future period of interest, and the control (reference) period. The list of combinations is limited to a 1000 (configurable) combinations which have a precipitation change closest to the target. The changes here are calculated using the means of the individual five-year segments. For the control period, the target is the mean of the individual runs. That is, all realizations are averaged over the 30-year control period, and this mean is targeted by the resampled control period. Stage 2: limit the allowed percentile ranges ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Given the distribution of the resampled combinations, calculated using the means of the five-year segments, the combinations are limited further by selecting resamples that have values within a certain percentile range. There are three settings that are applied one after the other: one for the precipitation range in summer, one for the range of temperature change in winter and one ofr the range of temperature change in summer. The resulting number of valid combinations is about 50, which are passed on to stage 3. All the percentile ranges are configurable for each scenario. The input is supplied through a configuration file in the TOML format, and looks similar to this:: [[W.H.2085]] var = "pr" season = "jja" control = [60.0, 100.0] future = [0.0, 40.0] [[W.H.2085]] var = "tas" season = "djf" control = [20.0, 50.0] future = [50.0, 80.0] [[W.H.2085]] var = "tas" season = "jja" control = [10.0, 50.0] future = [60.0, 100.0] [[W.H.2050]] var = "pr" season = "jja" control = [70.0, 100.0] future = [0.0, 40.0] [[W.H.2050]] var = "tas" season = "djf" control = [10.0, 40.0] future = [60.0, 90.0] [[W.H.2050]] var = "tas" season = "jja" control = [10.0, 50.0] future = [60.0, 100.0] ... The syntax is hopefully obvious: this uses a double bracket (an array in TOML), with the name following the tempearture scenario G/W, then the precipitation scenario H/L, then the epoch (all dot-separated). The array has 3 items, since there are three "sub" scenarios where the percentile restrictions are specified. The specific values required are the variable of interest, the season, the control percentile range, and the future percentile range. The latter two are 2-element list of floating point numbers. Comments and empty lines are optional. In total, there would be eight scenarios: ``W.H.2085``, ``W.H.2050``, ``W.L.2085``, ``W.L.2050``, ``G.H.2085``, ``G.H.2050``, ``G.L.2085``, ``G.L.2050``. Each scenario has three items, and each item requires four key-value pairs to be defined. Stage 3: random selection of a set of resamples, with limited re-use of input segments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Out of the 50 combinations, we select 8 (or any reasonable number) resamples. These are selected randomly, and this is done 10.000 (configurable) times. Out of these 10.000 sets of resamples, we provide penalties for re-use of the same segment in the resample runs: triple use yields a penalty of 1, four times a penalty of 4, anything more is thrown out automatically. Penalties are additive, since multiple segments may occur multiple times. These penalties are all configurable, with a small TOML configuration file. This file looks as follows:: [penalties] # Penalties for number of (multiple) occurrences of run-segment in resamples. # Starts from 1 occurrence, that is, no duplicate. # Only give the number of occurrences that have a penalty less than # infinity, including a 0.0 penalty (for e.g. a single, `1`, # occurrence). # All penalties should be floating point numbers. # Has the form `n-occurrences` = `penalty-value`. 1 = 0.0 2 = 0.0 3 = 1.0 4 = 5.0 # 5 and more yield an infinite penalty value The comments are not required, but this makes the configuration file hopefully self-documenting. Running the resampling module ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The actual command is: .. code-block:: bash python -m kcs.resample @ecearth-all-nlpoint.list --steering steering.csv \ --conditions step2_conditions.toml --penalties penalties.toml \ --precip-scenario L 4 --precip-scenario H 8 --relative pr Points to notice: * The precipitation scenarios need to be given explicitly, since the steering stable does not contain that information. * The ``--relative`` option takes a *list* of short variable names that should be calculated as a relative change; in this case only ``pr``. * The variables are deduced from the input, which here contains both ``pr`` and ``tas`` datasets. In fact, the ``ecearth-all-nlpoint.list`` file looks like:: @ecearth-tas-nlpoint.list @ecearth-pr-nlpoint.list That is, it is a nested @-list. * The list of files (in this case an @-list), is required. And you'll need to specify at least one ``--precip-scenario``. The last option takes two arguments: a name, and a value, indicating the percentage change in precipitation per degree temperature change, as before. * It is also possible to run just one (or a few) scenario(s) from the steering table. In that case, use the ``--scenario`` option, which takes three arguments: the global scenario name (G/W), the epoch (2050/2085) and the precipitation scenario (which you give with the ``--precip-scenario``). This option can also be used multiple times. Without this option, all scenarios are calculated in succession. The last option makes it possible to run all eight variants in parallel. For example, one can create a bash script that contains eight copies of the command at the start of this section, each put in the background, and each with a different ``--scenario [G,W] [2050,2085] [L,H]`` option added. A single scenario calculation takes up to fifteen minutes, depending on the number of input runs (sixteen runs in the fifteen minute case); while the actual calculation doesn't take too long, reading the datasets, extracting seasons and calculating the means adds to the total running time. Output ~~~~~~ The output of the resampling is written to a HDF 5 file. Two, in fact. One contains the data for the resampled data: the aveages and standard deviations, and the percentile changes between (resampled) control and (resampled) future. It is the latter we are normally interested in, since this fits with the previous (CMIP) calculations. The structure of this file (by default named ``resamples.h5``) looks as follows:: /2050/G/H/pr/djf/diff Dataset {12, 8} /2050/G/H/pr/djf/keys Dataset {8} /2050/G/H/pr/djf/mean Dataset {8} /2050/G/H/pr/djf/std Dataset {8} /2050/G/H/pr/jja Group /2050/G/H/pr/jja/diff Dataset {12, 8} /2050/G/H/pr/jja/keys Dataset {8} /2050/G/H/pr/jja/mean Dataset {8} /2050/G/H/pr/jja/std Dataset {8} /2050/G/H/pr/mam Group /2050/G/H/pr/mam/diff Dataset {12, 8} ... Thus, for each epoch, temperature and precipitation scenario, each variable, and each season (there is also "son") there are four datasets: ``diff``, ``keys``, ``mean``, ``std``. The size of the datasets in this example is 8, because there were eight statistics calculated: the mean and the 5, 10, 25, 50, 75, 90 and 95 percentiles. The latter can be found in the ``keys`` dataset. The ``diff`` datasets is 12 by 8, where the first dimension equals the number of requested resampled runs (the ``--nstep3`` option, here 12). The second dimension are the statistics again. The file structure can be examined with the command ``h5ls --recursive resamples.h5`` (which yields the above listing), while a quick look at the data can be obtained with the ``h5dump`` command, for example: .. code-block:: bash h5dump --dataset /2050/G/H/pr/mam/diff resamples.h5 (0,0): -4.86969, 2.98059, -8.86912, -12.488, -5.37732, -5.38094, -6.44242, (0,7): -1.30814, (1,0): 3.95456, 10.5316, 7.51921, 7.81239, 8.34815, -2.34305, 6.44583, (1,7): -4.05653, (2,0): 7.42924, 13.0411, 7.01548, 15.2165, -1.90975, 8.9375, 14.6143, (2,7): 14.7941, (3,0): 4.56155, 3.41901, 10.5502, 15.5352, 8.15393, 4.33509, -5.11064, (3,7): -5.99621, (4,0): 2.9189, -26.567, -3.01042, -5.91511, 5.42413, 2.54435, 8.93456, (4,7): 3.50783, (5,0): -1.60492, -8.12653, 2.79788, -9.45805, 2.84853, -2.63126, (5,6): -0.277906, 4.41441, (6,0): -0.216957, -32.8458, -13.7598, -2.39312, 5.25471, 4.29372, 4.76788, (6,7): 5.07935, (7,0): 6.55494, 42.2289, 6.38604, 4.6321, 1.54924, 6.47195, 2.73307, (7,7): 10.9296, (8,0): 6.70868, 26.3777, 5.348, 17.1805, 18.6534, 0.712798, 0.348545, (8,7): -5.1629, (9,0): 5.97353, 30.6496, 50.8518, 18.0891, 0.961236, 4.07857, 2.55165, (9,7): 2.51619, (10,0): 1.59531, 4.25298, 2.49238, -0.913777, -6.77163, 3.00944, 6.46021, (10,7): 2.06491, (11,0): 5.87735, 24.6462, 18.1432, 13.357, 4.82466, 5.44672, 1.35844, (11,7): -1.77472 The ``(x, y)`` are part of the ``h5dump`` output, and indicate the dataset coordinates (indices). For each of the twelve resampled runs (row-wise), there are eight statistics (column-wise), the ones mentioned above. The values are the (relative, since ``pr`` was used) differences between the control and future period. The other HDF 5 file is called ``indices.h5``, and it specifies the *indices* for the original runs and segments to obtain the resampled runs. This file looks as follows:: / Group /2050 Group /2050/G Group /2050/G/H Group /2050/G/H/control Dataset {8, 6} /2050/G/H/future Dataset {8, 6} /2050/G/L Group /2050/G/L/control Dataset {8, 6} /2050/G/L/future Dataset {8, 6} ... It has a control and future dataset for each scenario. Each dataset is two dimensional: the first axis is for the number of output runs, while the second is for the number of segments (30 years / 5 years = 6 segments, in our case). The values in the dataset are all positive integers, varying between 0 and the number of *input* runs minus 1 (minus 1, because Python indexes from 0 to n-1). Thus, for a set of indices as follows:: (0,0): 0, 4, 1, 6, 1, 3, (1,0): 2, 2, 2, 4, 3, 0, (2,0): 6, 1, 7, 0, 2, 4, (3,0): 7, 6, 7, 3, 0, 3, (4,0): 2, 4, 5, 7, 6, 6, (5,0): 6, 5, 6, 5, 7, 6, (6,0): 7, 1, 2, 6, 7, 4, (7,0): 0, 6, 3, 4, 3, 3 Resampled run number 0 would use the first five-year segment of input run number 0, its second five-year segment from input run number 4, its third five-year segment from input run number 1, and so on. Note that the five-year segments match one-on-one: the third segment in the input is also always a third segment in any output. This also shows which runs have double (or triple) uses in a paricular 5-year segment, since this would result in the same index repeated in a column. Input runs 6 and 7 seem to be popular, with several repeats in several columns. Thanks to the penalty system, no triple or more repeats occur. Note that repeats across a row do not matter: even if a row contains all the same indices, that just means that the original input run was a perfect match for all the conditions we threw at it. This indices file can be particularly helpful for the downsampling: it indicates which parts of our input runs have been used to create our resampled runs, but that also means it indicates which part of our *regional* model runs we have to use (and resample accordingly); provided our regional model input runs match one-to-one with our global model-of-interest runs. Finally, there will also be numerous CSV output files, named something like ``resampled_____.csv``. These are similar to the ``pr_change_W2050_jja_nlpoint_ecearth.csv`` files mentioned further above: they contain, for each resampled run, the necessary statistics, and are in fact identical to the ``diff`` datasets in the HDF 5 file. For example, ``resampled_2050_G_H_pr_mam.csv`` looks as follows:: mean,5,10,25,50,75,90,95 -4.869,2.980,-8.869,-12.487,-5.377,-5.380,-6.442,-1.308 3.954,10.531,7.519,7.812,8.348,-2.343,6.445,-4.056 7.429,13.041,7.015,15.216,-1.909,8.937,14.614,14.794 4.561,3.419,10.550,15.535,8.153,4.335,-5.110,-5.996 2.918,-26.567,-3.010,-5.915,5.424,2.544,8.934,3.507 -1.604,-8.126,2.797,-9.458,2.848,-2.631,-0.277,4.414 -0.216,-32.845,-13.759,-2.393,5.254,4.293,4.767,5.079 6.554,42.228,6.386,4.632,1.549,6.471,2.733,10.929 6.708,26.377,5.348,17.180,18.653,0.712,0.348,-5.162 5.973,30.649,50.851,18.089,0.961,4.078,2.551,2.516 1.595,4.252,2.492,-0.913,-6.771,3.009,6.460,2.064 5.877,24.646,18.143,13.357,4.824,5.446,1.358,-1.774 (Again truncated to three decimal digits.) Compare this to the ``h5dump`` output. These CSV files can be used when plotting the distribution of a variable change, exactly the same as before. Plotting the resampled runs ~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can overplot the resampled runs on top of the CMIP data as follows, using the same plotting command from before: .. code-block:: bash python -m kcs.change_perc.plot pr_change_2050_jja_nlpoint.csv pr_change_2050_jja_nlpoint.png \ --epoch 2050 --text 'precip, JJA', --ytitle 'Change (%)' --ylimits -60 45 \ --scenario-run G_H resampled_2050_G_H_pr_jja.csv \ --scenario-run W_H resampled_2050_W_H_pr_jja.csv \ --scenario-run G_L resampled_2050_G_L_pr_jja.csv \ --scenario-run W_L resampled_2050_W_L_pr_jja.csv Note that the input file names have changed: we now have precipitation scenarios, but we have lost information of the area we used. So if different areas are to be considered, a simple solution would be to use different subdirectories. The resulting plot (``pr_change_2050_jja_nlpoint.png``) will show all the individual resampled runs. That can be used as a measure to see how well the resampled runs cover the CMIP data, and how close they are to their average. If you don't want to plot the individual runs, use the ``--only-scenario-mean`` option: .. code-block:: bash python -m kcs.change_perc.plot pr_change_2050_jja_nlpoint.csv pr_change_2050_jja_nlpoint.png \ --epoch 2050 --text 'precip, JJA', --ytitle 'Change (%)' --ylimits -60 45 \ --scenario-run G_H resampled_2050_G_H_pr_jja.csv \ --scenario-run W_H resampled_2050_W_H_pr_jja.csv \ --scenario-run G_L resampled_2050_G_L_pr_jja.csv \ --scenario-run W_L resampled_2050_W_L_pr_jja.csv --only-scenario-mean Running it all together ~~~~~~~~~~~~~~~~~~~~~~~ To run this all together, resampling all scenarios and creating all the plots, and doing this in parallel, there is another helper bash script: ``extras/resample-and-plot-all.bash``, which does what it says. It does not use any new Python module: it just wraps around the previous commands in this section, using the ``--scenario`` option of ``kcs.resample`` to be able to run things in parallel.