Commit 30a3f2bd authored by Antoine Berchet's avatar Antoine Berchet
Browse files

Flux template working with random values

parent 452b683d
......@@ -2,9 +2,6 @@
:language: bash
Have a yaml file ready with a simulation that works with known plugins.
For the :doc:`obsoperator</documentation/plugins/obsoperators/index>`,
choose the optional argument :bash:`onlyinit` so that only the inputs are computed
XXXX CHECK THIS OPTION ACTUALLY DOES THISXXXX, not the whole simulation.
.. code-block:: yaml
......
......@@ -5,33 +5,42 @@ How to add a new type of flux data to be processed by the CIF into a model's inp
.. role:: bash(code)
:language: bash
0. .. include:: ../newBCdata/knownplugin.rst
1. In directory :bash:`plugins/fluxes`, copy the directory containing the template for a flux plugin :bash:`flux_plugin_template` in the directory for your new plugin.
.. include:: ../newBCdata/register.rst
2. Modify the yaml file to use the new plugin: the minimum input arguments are :bash:`dir`, :bash:`file`, :bash:`varname` and :bash:`unit_conversion`. The default space and time interpolations will be applied (see XXXX doc sur premiere simu directe avec exmeple yaml quand mise a jourXXXXX).
0. .. include:: ../newBCdata/knownplugin.rst
.. code-block:: yaml
components:
fluxes:
plugin:
name: fluxes
version: template
type: fluxes
dir: dir_with_original_files/
file: file_with_new_fluxes_to_use_as_inputs
varname: NAMEORIG
unit_conversion:
scale: 1.
1. In directory :bash:`plugins/fluxes`, copy the directory containing the template
for a flux plugin :bash:`flux_plugin_template` in the directory for your new plugin.
3. .. include:: ../newBCdata/devplugin.rst
.. include:: ../newBCdata/register.rst
XXXXXXX what about the input arguements? Ils demandent une partie dediee!?XXXXXXXXXX
2. Modify the yaml file to use the new plugin: the minimum input arguments are
:bash:`dir`, :bash:`file`, :bash:`varname` and :bash:`unit_conversion`.
The default space and time interpolations will be applied
(see XXXX doc sur premiere simu directe avec exmeple yaml quand mise a jourXXXXX).
.. code-block:: yaml
components:
fluxes:
plugin:
name: fluxes
version: template
type: fluxes
dir: dir_with_original_files/
file: file_with_new_fluxes_to_use_as_inputs
varname: NAMEORIG
unit_conversion:
scale: 1.
3. .. include:: ../newBCdata/devplugin.rst
XXXXXXX what about the input arguements? Ils demandent une partie dediee!?XXXXXXXXXX
4. Document the new plugin:
......
......@@ -26,16 +26,27 @@ You then need to create an empty file called :bash:`__init__.py`, so python inte
touch __init__.py
.. note::
Although the :bash:`__init__.py` is not strictly needed by the newest version of python, pycif fetches information
directly from this file, for initializing the plugin and automatically documenting it.
Therefore, please include it anyway
Registering your new :bash:`plugin` to pyCIF
--------------------------------------------
pyCIF attaches plugin functions as defined in the corresponding python module in :bash:`pycif/plugins/` automatically
from the yaml configuration file.
When the new plugin is created, it must be registered, so it can be called by other routines of pyCIF.
:bash:`plugins` in pyCIF are identified with:
- a name
- optional: a version (default: std)
When the new plugin is created, it must be registered, so it can be called by other routines of pyCIF.
This can be done by providing the name and (if relevant) version in :bash:`__init__.py` :
To register a new plugin, one must define the name and (if relevant) version in the file :bash:`__init__.py` :
.. code-block:: python
......@@ -50,6 +61,7 @@ You can check that your :bash:`plugin` is correctly registered by using the foll
from pycif.utils.classes.baseclass import Plugin
Plugin.print_registered()
Adding requirements to your :bash:`plugin`
------------------------------------------
......
......@@ -6,7 +6,7 @@ It generates random values and can be directly used with a working test case.
.. warning::
Please gradually document properly your plugin when starting from the template.
Please gradually document your plugin properly when starting from the template.
It includes :bash:`input_arguments` (see :doc:`here</contrib_doc>` for details),
as well as all information about the original data the plugin is supposed to
accommodate.
......@@ -29,7 +29,7 @@ To integrate your own flux plugin, please follow the steps:
1) copy the :bash:`flux_plugin_template` directory into one with a name of your
preference
2) Start writing the documentation of your plugin by replacing the present
bash:`docstring` in the file bash:`__init__.py`. Use rst syntax since this doctring
:bash:`docstring` in the file bash:`__init__.py`. Use rst syntax since this doctring
will be automatically parsed for publication in the documentation
3) Change the variables :bash:`_name`, :bash:`_version` (default is :bash:`std`) if
not specified, and :bash:`_fullname` (optional, is used as a title when
......@@ -58,6 +58,9 @@ from .get_domain import get_domain
from .read import read
from .write import write
from logging import info
_name = "flux"
_version = "template"
_fullname = "Template plugin for fluxes"
......@@ -69,4 +72,64 @@ input_arguments = {
"default": "let's say it's not mandatory",
"accepted": str
},
"file_freq": {
"doc": "The time frequency at which data files are available. ",
"default": "1D",
"accepted": str
},
"lon_min": {
"doc": "Minimum longitude ",
"default": -180,
"accepted": float
},
"lon_max": {
"doc": "Maximum longitude ",
"default": 180,
"accepted": float
},
"lat_min": {
"doc": "Minimum latitude ",
"default": -90,
"accepted": float
},
"lat_max": {
"doc": "Maximum latgitude ",
"default": 90,
"accepted": float
},
"nlon": {
"doc": "Number of grid cells in the zonal direction",
"default": 90,
"accepted": int
},
"nlat": {
"doc": "Number of grid cells in the meridional direction",
"default": 45,
"accepted": int
},
"nlev": {
"doc": "Number of levels in the data",
"default": 1,
"accepted": int
},
"average_value": {
"doc": "Average value for the generation of random values",
"default": 1,
"accepted": float
},
}
def ini_data(plugin, **kwargs):
"""Perform any further initialization step beyond the definition of default values
from input_arguments and the definition of the domain through the function
:bash:`get_domain`.
It includes, e.g., initializing new directories, or updating variable names
according to other variables, or dependencies, etc."""
info("Initializing the flux template")
info("End of intialization for the flux template")
......@@ -6,7 +6,7 @@ import pandas as pd
import numpy as np
from .....utils import path
from logging import info
from logging import debug
def fetch(ref_dir, ref_file, input_dates, target_dir,
......@@ -17,52 +17,77 @@ def fetch(ref_dir, ref_file, input_dates, target_dir,
Link reference files to the working directory to avoid interactions with the outer
world.
Should include input data dates encompassing the simulation interval, which means
that, e.g, if input data are at the monthly scale and the simulation interval
starts on 2010-01-15 to 2010-03-15, the output should at least include the input
data dates for 2010-01, 2010-02 and 2010-03.
Args:
ref_dir (str): the path to the input files
ref_file (str): format of the input files
input_dates (list): simulation interval (start and end dates)
target_dir (str): where to copy
tracer: the tracer Plugin, can be need to fetch extra information given by the user
component: the component Plugin, same as tracer
tracer: the tracer Plugin, corresponding to the paragraph
:bash:`datavect/components/fluxes/parameters/my_species` in the
configuration yaml; can be needed to fetch extra information
given by the user
component: the component Plugin, same as tracer; corresponds to the paragraph
:bash:`datavect/components/fluxes` in the configuration yaml
Return:
list_files: for each date that begins a period, an array containing
the names of the files that are available for the dates within this period
list_dates: for each date that begins a period, an array containing
the names of the dates mathcin the files listed in list_files
the names of the dates matching the files listed in list_files
"""
print(__file__)
import code
code.interact(local=dict(locals(), **globals()))
debug("Fetching files with the following information: \n"
"- datei/datef = {}\n"
"- dir = {}\n"
"- file = {}\n"
"- file_freq = {}\n\n"
"These three main arguments can either be defined in the relevant flux/my_spec "
"paragrah in the yaml, or, if not available, they are fetched from the "
"corresponding components/flux paragraph.\n"
"If one of the three needs to have a default value, it can be "
"integrated in the input_arguments dictionary in __init__.py for {}".format(
input_dates, ref_dir, ref_file, tracer.file_freq, __package__
))
list_files = {}
list_period_dates = \
pd.date_range(input_dates[0], input_dates[1],
freq=tracer.file_freq)
list_dates = {}
for datei in input_dates:
tmp_files = []
tmp_dates = []
for dd in input_dates[datei]:
print('Date to simulate:', dd)
# For each date dd, find_valid_file must provide the two files and associated
# dates between which it is found.
# If the date is directly available in a file, list twice said file.
# see more details in utils
files_orig, dates_orig = find_valid_file(ref_dir, ref_file, dd)
tmp_files.extend(files_orig)
tmp_dates.extend(dates_orig)
list_files = {}
for dd in list_period_dates:
file = dd.strftime("{}/{}".format(ref_dir, ref_file))
file_hours = pd.date_range(
dd, dd + pd.to_timedelta(tracer.file_freq), freq="1H")
list_dates[dd] = [[hh, hh + datetime.timedelta(hours=1)]
for hh in file_hours]
list_files[dd] = (len(file_hours) * [file])
# Fetching
local_files = []
for f, dd in zip(tmp_files, tmp_dates):
target_file = "{}/{}".format(target_dir, dd.strftime(ref_file))
path.link(f, target_file)
local_files.append(target_file)
print('Links created in ',target_dir,' : please check!')
unique_dates, unique_index = np.unique(tmp_dates, return_index=True)
list_files[datei] = np.array(tmp_files)[unique_index]# XX OR sorted(list(set(local_files)))
list_dates[datei] = unique_dates #XX OR sorted(list(set(tmp_dates)))
if os.path.isfile(file):
target_file = "{}/{}".format(target_dir, os.path.basename(file))
path.link(file, target_file)
debug(
"Fetched files and dates as follows:\n"
"Dates: {\n" +
"\n".join(["\n".join([" {}:".format(ddi)]
+ [" {}".format(dd)
for dd in list_dates[ddi]])
for ddi in list_dates])
+ "\n}\n\n" +
"Files: {\n" +
"\n".join(["\n".join([" {}:".format(ddi)]
+ [" {}".format(dd)
for dd in list_files[ddi]])
for ddi in list_files])
+ "\n}"
)
return list_files, list_dates
import numpy as np
from logging import debug
from .....utils.classes.setup import Setup
from .....utils.classes.domains import Domain
def get_domain(ref_dir, ref_file, input_dates, target_dir, tracer=None):
"""Read information to define the data horizontal and, if relevant, vertical domain
# Inputs:
#---------
# ref_dir: directory where the original files are found
# ref_file: (template) name of the original files
# input_dates: list of the periods to simulate, each item is the list of the dates of the period
# target_dir: directory where the links to the orginal files are created
#
# Ouputs:
#---------
# setup of the domain in section "Initializes domain"
print('Here, read the horizontal grid e.g. longitudes and latitudes')
print('Several possibilities: read a reference file, read a file among ')
print('the available data files, read a file specified in the yaml XXXah bon? Comment?XXX')
Args:
ref_dir (str): the path to the input files
ref_file (str): format of the input files
input_dates (list): simulation interval (start and end dates)
target_dir (str): where to copy
tracer: the tracer Plugin, corresponding to the paragraph
:bash:`datavect/components/fluxes/parameters/my_species` in the
configuration yaml; can be needed to fetch extra information
given by the user
print('Domain file for template fluxes:',domain_file)
print('From this file, obtain the coordinates of the centers and/or the corners of the grid cells')
print('If corners or centers are not available, deduce them from the available information')
print('WARNING: the grid must not be overlapping e.g for a global grid, the last grid cell must not be the same as the first')
Return:
domain (Domain): a domain class object, with the definition of the center grid
cells coordinates, as well as corners
print('Order the centers and corners latitudes and longitudes increasing order')
print('Get the min and max latitude and longitude of centers + the number of longitudes and latitudes')
print('Here, read the vertical information, from the same file as the horizontal information or from another')
print('Get the number of vertical levels')
print('Get or deduce the coefficients XXX et si on n\'est pas en sigma??XXX from bottom to top.')
print('If no vetical dimension for emissions, provide dummy vertical')
# punit = "Pa"
# nlevs = 1
# sigma_a = np.array([0])
# sigma_b = np.array([1])
"""
# Some explanations
debug(
'Here, read the horizontal grid, e.g., longitudes and latitudes.\n'
'Several possibilities: \n'
' - read a reference file\n'
' - read a file among the available data files\n'
' - read a file specified in the yaml, \n'
' by using the corresponding variable name; for instance, tracer.my_file\n'
'From the chosen file, obtain the coordinates of the centers and/or the corners '
'of the grid cells. If corners or centers are not available, deduce them from '
'the available information.\n'
'\n'
'WARNING: the grid must not be overlapping: '
'e.g for a global grid, the last grid cell must not be the same as the first'
'\n'
'Order the centers and corners latitudes and longitudes increasing order\n'
)
# For the purpose of demonstration, the domain dimensions are specified by default
# in input_arguments.
# Individual arguments can be modified manually in the yaml
lon_min = tracer.lon_min
lon_max = tracer.lon_max
lat_min = tracer.lat_min
lat_max = tracer.lat_max
nlon = tracer.nlon
nlat = tracer.nlat
debug("if lon and lat are flat vectors in the definition, "
"convert into a grid with:\n"
"zlon, zlat = np.meshgrid(lon, lat)\n")
# Some explanations
debug(
'Here, read the vertical information, from the same file as the horizontal '
'information or from another.\n'
'Get the number of vertical levels.\n'
'Get or deduce the sigma_a/sigma_b coefficients from bottom to top if the '
'vertical '
'extension is in pressure.\n'
'It is possible to specify the vertical extension in m a.g.l. as well by '
'defining the variable heights'
)
nlev = tracer.nlev
punit = "Pa"
sigma_a = np.linspace(0, 1, nlev)
sigma_b = np.linspace(1, 0, nlev)
# Initializes domain
setup = Setup.from_dict(
{
......@@ -56,7 +88,7 @@ def get_domain(ref_dir, ref_file, input_dates, target_dir, tracer=None):
"ymax": lat_max, # maximum latitude for centers
"nlon": nlon, # number of longitudinal cells
"nlat": nlat, # number of latitudinal cells
"nlev": nlevs, # number of vertical levels
"nlev": nlev, # number of vertical levels
"sigma_a": sigma_a,
"sigma_b": sigma_b,
"pressure_unit": "Pa" # adapted to sigmas
......@@ -65,11 +97,5 @@ def get_domain(ref_dir, ref_file, input_dates, target_dir, tracer=None):
)
Setup.load_setup(setup, level=1)
# if lon and lat are vectors, convert into a grid with
# zlon, zlat = np.meshgrid(lon, lat)
setup.domain.zlon = zlon # longitudes of centers
setup.domain.zlat = zlat # latitudes of centers
setup.domain.zlonc = zlonc # longitudes of corners
setup.domain.zlatc = zlatc # latitudes of corners
return setup.domain
import datetime
import os
import numpy as np
import xarray as xr
from netCDF4 import Dataset
from .....utils.netcdf import readnc
from logging import debug
def read(
self,
name,
tracdir,
tracfile,
varnames,
dates,
interpol_flx=False,
tracer=None,
model=None,
**kwargs
self,
name,
varnames,
dates,
files,
interpol_flx=False,
tracer=None,
model=None,
ddi=None,
**kwargs
):
"""Get fluxes from raw files and load them into a pyCIF
variables
variables.
Args:
name (str): name of the component
varnames (list[str]): original names of variables to read; use `name`
if `varnames` is empty
dates: list of the date intervals to extract
files: list of the files matching dates
Return:
xr.DataArray: the actual data with dimension:
time, levels, latitudes, longitudes
"""
# Inputs:
#---------
# name: name of the component
# tracdir: directory of the raw files
# tracfile: list of the raw files to read
# dates: list of the dates matching the files
# -> tacfile and dates as provided by fetch.py
# varnames: original names of variables to read
#
# Ouputs:
#---------
# a DataArray with the actual data: 4 dimensional (time, vertical levels, latitude, longitude) with
# coordinate in increasing order (see also get_domain).
list_files = tracfile
if type(tracfile) != list:
list_files = [dd.strftime(tracfile) for dd in dates]
# list of the various fields read:
# data = []
for dd, ff in zip(dates, list_files):
print('Here put the reading of ', [varnames],' in ',ff,' for ',dd)
print('e.g. get a 3d array read_field')
# data.append(read_field)
# Get domain dimensions for random generation
domain = tracer.domain
nlon = domain.nlon
nlat = domain.nlat
nlev = domain.nlev
# Loop over dates/files and import data
data = []
out_dates = []
for dd, ff in zip(dates, files):
debug(
"Reading the file {} for the date interval {}".format(
ff, dd
)
)
# Generate random values instead of reading
data.append(
np.random.normal(
tracer.average_value, tracer.average_value / 2,
(nlev, nlat, nlon)))
out_dates.append(dd[0])
# if only one level for emissions, create the axis:
xmod = xr.DataArray(
np.array(data)[:, np.newaxis, ...],
coords={"time": dates},
np.array(data),
coords={"time": out_dates},
dims=("time", "lev", "lat", "lon"),
)
return xmod
import datetime
import glob
import os
import calendar
import numpy as np
def find_valid_file(ref_dir, file_format, dd):
# Get all files and dates matching the file and format
print('Here, basic listing of all available files')
list_files_orig = os.listdir(ref_dir)
list_files_avail = []
for f in list_files_orig:
try:
check = datetime.datetime.strptime(f, file_format)
list_files_avail.append(f)
except:
continue
print('All availables files',list_files_avail)
print('Modifiy to suit your case e.g. list less files, list also files for antoher year or month.')
print('See examples in other plugins or in the documentation/tutorials.')
list_files = []
list_dates = []
print('Here, list provided dates and matching files')
print('The list of dates is probably sparse e.g. every 3 hours or every month.')
list_files = np.array(list_files)
list_dates = np.array(list_dates)
# Sorting along dates
isort = np.argsort(list_dates)
list_dates = list_dates[isort]
list_files = list_files[isort]
if list_files == []:
raise Exception("Did not find any valid flux files in {} "
"with format {}. Please check your yml file"
.format(ref_dir, file_format))