Dataset characteristics are a way to add an additional dimension to variables (in addition to the time dimension used for time series). For example the different wavelengths for measured for aerosol_light scattering can be seen as an additional dimension for those measurements. Other examples are the particle diameter fro particle_number_size_distribution, or the different inlet heights for tower measurements.
Nasa Ames 1001 (the base for EBAS Nasa Ames) does not directly support additional dimensions. In case of writing the data to Nasa Ames, the additional dimensions are written to as single variables (as in the data model), each with a special metadata element encoding the respective dimension extent.
Other formats like NetCDF or the OPeNDP data model supports multidimensional variables. So when such data are written to NetCDF for exaple, the variables with different values for such a dimension are all written to a single variable with an additional dimension.
Multiple dimensions can be added, as for example with cloud condensation nucleus measurements when measuring CCN concentration as function of supersaturation and particle diameter (see https://ebas-submit.nilu.no/Submit-Data/Data-Reporting/Templates/Category/Aerosol/Cloud-Condensation-Nucleus-Counter/level2/CCN-concentration-as-function-of-supersaturation-and-particle-diameter)
When creating files with ebas-io the characteristics must of course be created. In the same way for reading files, the characteristics must be accessed and used to interpret the data.
Here we define a method for setting up the file object for writing. Define all needed global metadata and set up the time axis. Don't pay much attention to this, it's necessary, but out of scope for this exercise.
import datetime
from ebas.io.ebasmetadata import DatasetCharacteristicList
from nilutility.datetime_helper import DatetimeInterval
from nilutility.datatypes import DataObject
def setup_global_metadata(outfile):
outfile.metadata.revdate = datetime.datetime.utcnow()
outfile.metadata.datalevel = '2'
outfile.metadata.station_code ='NO0002R'
outfile.metadata.station_name = 'Birkenes II'
outfile.metadata.matrix = 'pm10'
outfile.metadata.lab_code = 'NO01L'
outfile.metadata.instr_type = 'filter_absorption_photometer'
outfile.metadata.instr_name = 'my_instrument'
outfile.metadata.method = 'NO01L_my_method'
outfile.metadata.reference_date = datetime.datetime(2020, 1, 1)
outfile.metadata.resolution = '1h'
outfile.metadata.projects = ['ACTRIS']
outfile.metadata.org = DataObject(
OR_CODE='NO01L',
OR_NAME='Norwegian Institute for Air Research',
OR_ACRONYM='NILU', OR_UNIT='Atmosphere and Climate Department',
OR_ADDR_LINE1='Instituttveien 18', OR_ADDR_LINE2=None,
OR_ADDR_ZIP='2007', OR_ADDR_CITY='Kjeller', OR_ADDR_COUNTRY='Norway'
)
outfile.metadata.originator.append(DataObject(
PS_LAST_NAME=u'Someone', PS_FIRST_NAME='Else',
PS_EMAIL='Someone@somewhere.no',
PS_ORG_NAME='Some nice Institute',
PS_ORG_ACR='WOW', PS_ORG_UNIT='Super interesting division',
PS_ADDR_LINE1='Street 18', PS_ADDR_LINE2=None,
PS_ADDR_ZIP='X-9999', PS_ADDR_CITY='Paradise',
PS_ADDR_COUNTRY='Norway',
PS_ORCID=None,
))
outfile.metadata.submitter.append(DataObject(
PS_LAST_NAME=u'Someone', PS_FIRST_NAME='Else',
PS_EMAIL='Someone@somewhere.no',
PS_ORG_NAME='Some nice Institute',
PS_ORG_ACR='WOW', PS_ORG_UNIT='Super interesting division',
PS_ADDR_LINE1='Street 18', PS_ADDR_LINE2=None,
PS_ADDR_ZIP='X-9999', PS_ADDR_CITY='Paradise',
PS_ADDR_COUNTRY='Norway',
PS_ORCID=None,
))
outfile.sample_times = [
DatetimeInterval(datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 1, 1, 1, 0)),
DatetimeInterval(datetime.datetime(2020, 1, 1, 1, 0), datetime.datetime(2020, 1, 1, 2, 0))
]
Here comes the interesting part: We set up some variables with characteristics (additional dimension). Specifically we add aerosol_absorption_coefficient in three different wavelengths.
Pay attention to the parts commented with ### ADD CHARACTERISTICS
def setup_variables(outfile):
# variable 1: aerosol_absorption_coefficient, 470 nm
values = [0.5566, None] # missing value is None!
flags = [[], [999]]
metadata = DataObject()
metadata.comp_name = 'aerosol_absorption_coefficient'
metadata.unit = '1/Mm'
metadata.title = 'abs470'
metadata.uncertainty = (6, '%')
# add the variable
outfile.variables.append(DataObject(values_=values, flags=flags, flagcol=True,
metadata=metadata))
### ADD CHARACTERISTICS
outfile.add_var_characteristics(-1, 'Wavelength', 450)
# variable 2: aerosol_absorption_coefficient, 520 nm
values = [0.3196, None] # missing value is None!
flags = [[], [999]]
metadata = DataObject()
metadata.comp_name = 'aerosol_absorption_coefficient'
metadata.unit = '1/Mm'
metadata.title = 'abs520'
metadata.uncertainty = (6, '%')
# add the variable
outfile.variables.append(DataObject(values_=values, flags=flags, flagcol=True,
metadata=metadata))
### ADD CHARACTERISTICS
outfile.add_var_characteristics(-1, 'Wavelength', 520)
# variable 3: aerosol_absorption_coefficient, 660 nm
values = [0.3956, None] # missing value is None!
flags = [[], [999]]
metadata = DataObject()
metadata.comp_name = 'aerosol_absorption_coefficient'
metadata.unit = '1/Mm'
metadata.title = 'abs660'
metadata.uncertainty = (6, '%')
# add the variable
outfile.variables.append(DataObject(values_=values, flags=flags, flagcol=True,
metadata=metadata))
### ADD CHARACTERISTICS
outfile.add_var_characteristics(-1, 'Wavelength', 660)
Create an output file object, add the global metadata, the time dimension and add the variables (aerosol_absorption_coefficient in three wavelenghts):
from ebas.io.file.nasa_ames import EbasNasaAmes
nas = EbasNasaAmes()
setup_global_metadata(nas)
setup_variables(nas)
When we write the Nasa Ames file, we see the three variables, each with the Wavelength
specified in the VNAME lines (lines 14-16):
nas.write()
When we do the same with an EbasNetcf object, we see that the variable
float64 aerosol_absorption_coefficient(u'Wavelength', u'time')
is in fact a 2D variable (with dimensions Wavelength and time).
Remark: The following cell will only run successfully if netCDF4 is installed on your system. Although netCDF4 is necessary for creating NetCDF output, it is not a dependecy when installing the ebas-io package (for most users this would be an unecessary and quite heavy dependency; users who usually work with NetCDF will have the module installed anyway).
Remark: Writing an EbasNetcdf object to stdout just dumps the header for convinience. When writing to an actual output file (ncf.write(createfiles=True)
), the actual NetCDF file would be written in full.
from ebas.io.file.netcdf import EbasNetcdf
ncf = EbasNetcdf()
setup_global_metadata(ncf)
setup_variables(ncf)
ncf.write()
We read a test file containing cloud condenstation nucleus concentration as function of supersaturation and particle diameter, which means two additoional dimensions (particle diameter and super saturation).
import logging
import pprint
logging.basicConfig(level=logging.ERROR)
from ebas.io.file.nasa_ames import EbasNasaAmes
nas = EbasNasaAmes()
nas.read('test_ccn.nas', ignore_valuecheck=True)
Like other any other metadata we can also find out the characteristics of each variable. The characteristics are implemented as a list of characteristics in each variables metadata. The single characteristics ar dictionaries containing the information about a single characteristic.
Each characteristic contains the following elements:
CT_TYPE
: the type of the characteristic, like Wavelength
or Inlet towr Height
CT_DATATYPE
: the data type of the characteristic: DBL
for float, INT
for integer and CHR
for stringDC_VAL_DBL
: the actual value for the chatacteristicCO_COMP_NAME
and FT_TYPE
: are stored internally in order to control the validity of the characteristic for a specific component and instrument typefor i, var in enumerate(nas.variables):
# iterate through the variables of the file and
# find out which matrix, component name and statistics code it is:
matrix = nas.get_meta_for_var(i, 'matrix')
comp_name = nas.get_meta_for_var(i, 'comp_name')
statistics = nas.get_meta_for_var(i, 'statistics')
print(", ".join([matrix, comp_name, statistics]))
# And this is how the characteristics look like:
pprint.pprint(var.metadata.characteristics)
As we see below, the list of of characteristics is of type DatasetCharacteristicList and the single characteristics are of type DatasetCharacteristic. Thus we can expect some additional functionality.
print(type(nas.variables[0].metadata.characteristics))
print(type(nas.variables[0].metadata.characteristics[0]))
The DatasetCharacteristicList provides some methods for handlin all characteristics for one variable.
Probably the most useful method when reading characteristics from a file, is the as_dict() method, which gives us one single dictionary from all characteristics of a variable.
print(nas.variables[1].metadata.characteristics.as_dict())
The sorted() yields all characteristics in the defined sort order. It can be used for serializing the characteristics, e.g. i a string like this:
print(", ".join(["{}={}".format(char.CT_TYPE, char.value_string())
for char in nas.variables[1].metadata.characteristics.sorted()]))
Get a single characteristics element by the characteristic type.
pprint.pprint(nas.variables[1].metadata.characteristics.dc_by_ct_type('SS'))
# There is no Wavelength for CCNC...
print("\nIn case of not set characteristic type: {}".format(
nas.variables[1].metadata.characteristics.dc_by_ct_type('Wavelength')))
# let's say we want to get the tuple for the SS characteristic of the first variable:
cha = nas.variables[3].metadata.characteristics.dc_by_ct_type('SS')
print('SS tuple:', cha.tuple())
# or we want to iterate through all characteristics of the first variable and get the tuple:
print("all tuples:")
for cha in nas.variables[3].metadata.characteristics:
print(" ", cha.tuple())
For accessing just the value or just the unit of a characteristic.
(Value and unit are property methods and look like attibutes)
cha = nas.variables[3].metadata.characteristics.dc_by_ct_type('SS')
print(cha.value)
print(type(cha.value))
print(cha.unit)
Generate the thring representation of value and unit for a characteristic.
cha = nas.variables[3].metadata.characteristics.dc_by_ct_type('D')
print(cha.value_string())