aspecd.metadata module

Metadata: Information on numeric data stored in a structured way.

Metadata are one key concept of the ASpecD framework, and they come in different flavours. Perhaps the easiest to grasp is metadata that accompany measurements—and are often stored separately from the data in metadata files. Other types of metadata are those of processing steps or representations. This module is only concerned with the first kind of metadata, information accompanying data and usually recorded at the same time as the numeric data.

Each aspecd.dataset.Dataset contains a aspecd.dataset.Dataset.metadata attribute that is in turn an object of class aspecd.metadata.DatasetMetadata. This latter object is composed of different metadata objects each inheriting from aspecd.metadata.Metadata. Upon import of a dataset, the importer class needs to make sure that as many metadata as possible are read and imported to the dataset as well.

Generally speaking, metadata can be thought of as key–value stores that might be hierarchically structured and thus cascaded. Nevertheless, classes have some advantages over using simple dictionaries, as there are certain operations that are common to some or all types of metadata.

Metadata classes

The classes implemented in this module can be grouped into general metadata classes, concrete metadata classes, and metadata classes for datasets. Each will be described shortly below.

General metadata classes

The most basic class is aspecd.metadata.PhysicalQuantity storing all relevant information about a physical quantity in an easily accessible way, eventually allowing to test for commensurable quantities and converting between units.

Next is aspecd.metadata.Metadata as a generic class for all metadata containers. All other classes storing metadata, particularly those storing metadata accompanying measurements and therefore ending up in the metadata of a aspecd.dataset.Dataset, should inherit from this class.

Concrete metadata classes

Currently, three classes for actual metadata of experimental datasets and one class for calculated datasets are contained in the ASpecD framework, namely aspecd.metadata.Measurement for storing general information about a given measurement, aspecd.metadata.Sample for all information regarding the sample investigated, and aspecd.metadata.TemperatureControl for information about the temperature control (including whether temperature has been actively controlled at all during the measurement) for experimental datasets and aspecd.metadata.Calculation for storing details about the calculation underlying the (numeric) data for calculated datasets.

Metadata classes for datasets

The attribute metadata in the aspecd.dataset.ExperimentalDataset is of type aspecd.metadata.ExperimentalDatasetMetadata and contains the three metadata classes for experimental datasets named above. Derived packages should extend this class accordingly.

Similarly, the attribute metadata in the aspecd.dataset.CalculatedDataset is of type aspecd.metadata.CalculatedDatasetMetadata and contains the respective metadata class named above. Derived packages should extend this class accordingly wherever necessary.

Converting metadata from and to dictionaries

All classes inheriting from aspecd.metadata.Metadata provide a method from_dict() allowing to set the attributes of the objects. This allows for easy use with metadata read from a file into a dict.

Similiarly, all classes inheriting from aspecd.metadata.Metadata as well as aspecd.metadata.PhysicalQuantity provide a method to_dict() that returns a dictionary of all public attributes of the respective object. This allows to write metadata to a file.

Mapping metadata read from external sources

Generally, the representation and structure of metadata within the dataset of the ASpecD framework and each application derived from it is separate from the way the very same metadata are organised in files written mostly during data acquisition. To map the structure obtained by reading a metadata file to the internal representation within the dataset, as given by the aspecd.metadata.ExperimentalDatasetMetadata class, there exists a generic mapper class aspecd.metadata.MetadataMapper. This way, you can separate the representations of metadata and support mapping for different versions of metadata files.

Note

As mappings can become quite complicated and specifying lists of mappings for the aspecd.metadata.MetadataMapper by hand can become quite tedious, you can specify metadata mapping recipes in YAML files in a rather simple syntax. See the documentation of the aspecd.metadata.MetadataMapper class and its aspecd.metadata.MetadataMapper.create_mappings() method for details.

This method and the underlying ideas are heavily based on concepts and code developed by J. Popp for use within the trEPR Python package.

Metadata in packages based on the ASpecD framework

The dataset as unit of (numerical) data and metadata is a key concept of the ASpecD framework and a necessary prerequisite for a semantic understanding within the routines. Every measurement (or calculation) produces (raw) data that are useless without additional information, such as experimental parameters. This additional information is termed “metadata” within the ASpecD framework.

Additionally to combining numerical data and metadata, a dataset provides a common structure, unifying the different file formats used as source for both, data and metadata. Hence, the actual data format does not matter, greatly facilitating dealing with data from different sources (and even different kinds of data).

Therefore, if you develop a new package based on the ASpecD framework, one of the first and most important steps is to create a (hierarchical) metadata structure for your datasets. This requires a thorough understanding of the spectroscopic method you develop the package for and most probably several years of practical experience in the lab. Good sources of inspiration are the vendor file formats usually storing instrument parameters and alike in some form. If you are lucky, you can actually access this information. If not, you may need to store these metadata in an additional external file that gets written manually during data recording.

Some basic metadata that are rarely contained within vendor file formats, as they concern the actual sample measured, as well as metadata for calculated datasets can be found in the aspecd.metadata.ExperimentalDatasetMetadata and aspecd.metadata.CalculatedDatasetMetadata.

Module documentation

class aspecd.metadata.PhysicalQuantity(string=None, value=None, unit=None)

Bases: aspecd.utils.ToDictMixin

Class for storing all relevant informations about a physical quantity.

A physical quantity, Q, consists always of a value, {Q} and a corresponding unit, [Q], hence:

Q = {Q} [Q] .

See, e.g., the “IUPAC Green Book” for further details.

To get a string representation of a physical quantity, i.e., its value and unit separated by a single space, simply use str().

To set value (and unit) from a string, either use from_string() or supply the string as argument while instantiating the object. Make sure the value part of the string to be convertible to float. Otherwise, a ValueError will be raised.

unit

symbol of the unit of the corresponding value

SI units are preferred

Type

str

dimension

dimension of the corresponding value

useful for (automatic) conversions

Type

str

name

name of the physical quantity in a given context

Type

str

Parameters
  • string (str) –

    String containing value and unit, separated by whitespace

    Will be used to set value and unit correspondingly.

    If no second element separated by whitespace is present, only value will be set.

  • value (float) – Numerical value

  • unit (str) –

    String containing the unit of the corresponding value.

    SI units are preferred, and their abbreviations should be used.

Raises

ValueError – Raised if value is not a float

property value

Get or set the value of a physical quantity.

A value is always a float.

Raises

ValueError – Raised if value is not a (scalar) float

commensurable(physical_quantity=None)

Check whether two physical quantities are commensurable.

There are two criteria for physical quantities to be commensurable. Either they have the same unit, or they have the same dimension. In the latter case, a unit conversion is generally possible.

Parameters

physical_quantity (aspecd.metadata.PhysicalQuantity) – physical quantity to test commensurability with

Returns

commensurable – True if both physical quantities have the same unit or dimension, False otherwise

Return type

bool

from_string(string)

Set value and unit from string.

Parameters

string (str) –

String containing value and unit, separated by whitespace

Will be used to set value and unit correspondingly.

If no second element separated by whitespace is present, only value will be set.

If an empty string is provided, value and unit are cleared.

from_dict(dict_=None)

Set properties from dictionary.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Parameters

dict (dict) – Dictionary containing properties to set

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.Metadata(dict_=None)

Bases: aspecd.utils.ToDictMixin

General metadata class.

Metadata can be set from dict upon initialisation.

Metadata can be converted to dict via aspecd.utils.ToDictMixin.to_dict().

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

If a property is of class aspecd.metadata.PhysicalQuantity, it is set accordingly.

Parameters

dict (dict) – Dictionary containing properties to set

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.TemperatureControl(dict_=None, temperature='')

Bases: aspecd.metadata.Metadata

Temperature control is very often found in spectroscopy.

This class provides general means of storing relevant parameters for temperature control.

temperature

value and unit of the temperature set

Type

aspecd.metadata.PhysicalQuantity

controller

type and name of the temperature controller used

Type

str

property controlled

Has temperature been actively controlled during measurement?

Read-only property automatically set when setting a temperature value.

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

If “controlled” is set to False in the dictionary, the temperature value and unit will be cleared.

The value of the temperature key needs to be a string.

Parameters

dict (dict) – Dictionary with keys corresponding to properties of the class.

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.Measurement(dict_=None)

Bases: aspecd.metadata.Metadata

General information available for each type of measurement.

start

Date and time of start of measurement

Type

datetime

end

Date and time of end of measurement

Type

datetime

purpose

Purpose of measurement, often quite helpful to know

Type

str

operator

Name of the operator performing the measurement Beware of the implications for privacy protection

Type

str

labbook_entry

Identifier for lab book entry (usually either LOI or URL)

Type

str

Parameters

dict (dict) – Dictionary containing fields corresponding to attributes of the class

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

For the “start” and “end” items, there are two different conventions available how the dictionary can be structured. Either those fields are dictionaries themselves, with fields “date” and “time” accordingly, such as:

{"start": {"date": "yyyy-mm-dd", "time": "HH:MM:SS"},
 "end": {"date": "yyyy-mm-dd", "time": "HH:MM:SS"}}

Alternatively, those fields can be strings containing a representation of both, date and time:

{"start": "yyyy-mm-dd HH:MM:SS", "end": "yyyy-mm-dd HH:MM:SS"}

Use whichever is more appropriate for you.

Parameters

dict (dict) – Dictionary with keys corresponding to properties of the class.

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.Sample(dict_=None)

Bases: aspecd.metadata.Metadata

Information on the sample measured.

name

Short name of the sample

Type

str

id

Unique identifier of the sample

Type

str or int

loi

Lab Object Identifier (LOI) for the sample

Type

str

Parameters

dict (dict) – Dictionary containing fields corresponding to attributes of the class

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

If a property is of class aspecd.metadata.PhysicalQuantity, it is set accordingly.

Parameters

dict (dict) – Dictionary containing properties to set

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.Calculation(dict_=None)

Bases: aspecd.metadata.Metadata

Information on the calculation.

type

Type of the calculation – usually the full class name

Type

str

parameters

Parameters of the calculation

Type

dict

Parameters

dict (dict) – Dictionary containing fields corresponding to attributes of the class

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

If a property is of class aspecd.metadata.PhysicalQuantity, it is set accordingly.

Parameters

dict (dict) – Dictionary containing properties to set

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.Device(dict_=None)

Bases: aspecd.metadata.Metadata

Information on the device contributing device data.

The dataset concept (see aspecd.dataset.Dataset) rests on the assumption that there is one particular set of data that can be regarded as the actual or primary data of the dataset. However, in many cases, parallel to these actual data, other data are recorded as well, be it readouts from monitors or alike.

This class contains the metadata of the corresponding devices whose data are of type aspecd.dataset.DeviceData. That class contains an attribute aspecd.dataset.DeviceData.metadata.

Note

You will usually need to implement derived classes for concrete devices, as this class only contains a minimum set of attributes.

label

Label of the device

Type

str

Parameters

dict (dict) – Dictionary containing fields corresponding to attributes of the class

New in version 0.9.

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

If a property is of class aspecd.metadata.PhysicalQuantity, it is set accordingly.

Parameters

dict (dict) – Dictionary containing properties to set

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.DatasetMetadata

Bases: aspecd.utils.ToDictMixin

Metadata for dataset.

This class contains the minimal set of metadata for a dataset.

Metadata of actual datasets should extend this class by adding properties that are themselves classes inheriting from aspecd.metadata.Metadata.

Metadata can be converted to dict via aspecd.utils.ToDictMixin.to_dict(), e.g., for generating reports using templates and template engines.

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

Parameters

dict (dict) –

Dictionary with metadata.

Each key of this dictionary corresponds to a class attribute and is in itself a dictionary with the correct set of attributes for the particular class.

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.ExperimentalDatasetMetadata

Bases: aspecd.metadata.DatasetMetadata

Metadata for an experimental dataset.

This class contains the minimal set of metadata for an experimental dataset, i.e., aspecd.dataset.ExperimentalDataset.

Metadata of actual datasets should extend this class by adding properties that are themselves classes inheriting from aspecd.metadata.Metadata.

Metadata can be converted to dict via aspecd.utils.ToDictMixin.to_dict(), e.g., for generating reports using templates and template engines.

measurement

Metadata of measurement

Type

aspecd.metadata.Measurement

sample

Metadata of sample

Type

aspecd.metadata.Sample

temperature_control

Metadata of temperature control

Type

aspecd.metadata.TemperatureControl

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

Parameters

dict (dict) –

Dictionary with metadata.

Each key of this dictionary corresponds to a class attribute and is in itself a dictionary with the correct set of attributes for the particular class.

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.CalculatedDatasetMetadata

Bases: aspecd.metadata.DatasetMetadata

Metadata for a dataset with calculated data.

This class contains the minimal set of metadata for a dataset consisting of calculated data, i.e., aspecd.dataset.CalculatedDataset.

Metadata of actual datasets should extend this class by adding properties that are themselves classes inheriting from aspecd.metadata.Metadata.

Metadata can be converted to dict via aspecd.utils.ToDictMixin.to_dict(), e.g., for generating reports using templates and template engines.

calculation

Metadata of calculation underlying the numeric data

Type

aspecd.metadata.Calculation

from_dict(dict_=None)

Set properties from dictionary, e.g., from metadata.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Keys in the dictionary are converted to lower case and spaces converted to underscores to fit the naming scheme of attributes.

Parameters

dict (dict) –

Dictionary with metadata.

Each key of this dictionary corresponds to a class attribute and is in itself a dictionary with the correct set of attributes for the particular class.

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

Changed in version 0.9.2: Dictionaries do not get copied by default, but there is a private method that can be overridden in derived classes to copy the dictionary.

class aspecd.metadata.MetadataMapper

Bases: object

Mapper for metadata.

Allows to convert a dictionary containing metadata read, e.g., from a metadata file to a dictionary that corresponds to the internal structure of the metadata in a dataset stored in aspecd.metadata.ExperimentalDatasetMetadata.

If all you need is to convert the dictionary keys to proper variable names conforming to the naming scheme proposed by PEP 8, you may simply use the method keys_to_variable_names().

Tasks that can be currently performed to map a dictionary to the internal structure of the metadata representation in a dataset contain renaming of keys via rename_key() and combining items via combine_items() as well as copying keys via copy_key() and moving items via move_item().

Rather than performing the mappings by hand, calling these methods repeatedly, you may use a mapping table contained in the mappings attribute. If you pre-define such mapping tables, you can easily apply different mappings depending on the version of your original metadata structure. Once you assigned the appropriate mapping table to the mappings attribute, simply call map(). If everything turns out well, this should map your metadata contained in metadata according to the mapping table contained in mappings(). Finally, you may want to assing this converted data structure to your dataset’s metadata attribute, using ExperimentalDatasetMetadata.from_dict().

As it is often tedious to manually create the entries of the mapping table residing in mappings, you can use mapping recipes stored in YAML files together with the create_mappings() method. For details of the structure of the mapping recipe YAML files, see the documentation of the create_mappings() method. Note that you need to specify the filename for the mapping recipe used in recipe_filename as well as the version of the metadata file format in version to get this to work. The create_mappings() method and the underlying ideas are heavily based on concepts and code developed by J. Popp for use within the trEPR Python package.

Note

The mapping recipes should be stored within the package, and as accessing files from within packages should not be done using regular fiile-system paths, but rather the respective functionality of the pkgutil package used. Therefore, internally, the aspecd.utils.get_package_data() function is used. In case of using the MetadataMapper class from a package derived from ASpecD, prefix the name of the recipe file with the package, followed by the ‘@’ character. As an example, if you would want to use the recipe ‘mappings.yaml’ from within the package ‘trepr’, you would need to specify trepr@mappings.yaml as :attr`recipe_filename`.

metadata

Dictionary containing the metadata that are converted in place

Type

dict

mappings

Tasks to perform to map dictionary

Each task is a list containing three entries:

  1. an optional key of a “sub-dictionary” to operate on

  2. the action to carry out

  3. a list containing the necessary parameters to carry out the action

For examples, see the documentation of the map() method.

Type

list

version

Version of the metadata to map

Particularly important when you use create_mappings() to create the mappings from mapping recipes stored in a YAML file.

Type

str

recipe_filename

Name of the YAML file containing the mapping recipes

Needs to be specified when you use create_mappings() to create the mappings from mapping recipes stored in a YAML file.

Type

str

Examples

To actually use the mapper, you will usually create a file (in YAML format) containing the mappings. For details how this file may look like, see the create_mappings() method. Suppose you have saved your mappings to the file mappings.yaml, with different mappings for the different versions of your formats. In this case, using the mapper may look similar to the following:

mapper = aspecd.metadata.MetadataMapper()
mapper.version = version_string
mapper.metadata = dict_to_be_mapped
mapper.recipe_filename = 'mappings.yaml'
mapper.map()
modified_dict = mapper.metadata

As you can see, the modified_dict contains the dictionary where the mappings from mappings.yaml have been applied to.

Changed in version 0.6: Recipe files now retrieved from package data now via aspecd.utils.get_package_data()

rename_key(old_key='', new_key='')

Rename key in dictionary.

Note that this method does not preserve the order of keys in an ordered dictionary.

Parameters
  • old_key (str) – Name of original key that shall be renamed

  • new_key (str) – New name of key to be renamed

combine_items(old_keys=None, new_key='', pattern='')

Combine two items in a dictionary.

Parameters
  • old_keys (list) – Keys that should be combined

  • new_key (str) – Name of new key

  • pattern (str) –

    Pattern to use to join the keys together.

    Defaults to the empty string.

keys_to_variable_names()

Convert keys in metadata to proper variable names.

Variable names in Python should be all lower case, with words joined by underscores.

Due to recursively traversing through the metadata dictionary, conversion is performed for (near) arbitrary depth.

copy_key(old_key='', new_key='')

Copy key in dictionary to new key.

This method is particularly useful in cases where keys need to be combined using combine_keys(), but where one of the keys should be combined several times with another key.

Parameters
  • old_key (str) – Name of original key that shall be copied

  • new_key (str) – Name of new key to be added

move_item(key='', source_dict_name='', target_dict_name='', create_target_dict=False)

Move item (i.e., key-value pair) between dictionaries.

Note

If the target dictionary does not exist, usually the method will not create it and raise an appropriate exception. However, if explicitly told to create the target dictionary, it will do so. This is to prevent accidental typos from messing up with the dictionary and resulting in hard to track bugs.

Parameters
  • key (str) – Name of the key of the corresponding item to move

  • source_dict_name (str) – Name of the dict the item should be moved from

  • target_dict_name (str) – Name of the dict the item should be moved to

  • create_target_dict (bool) – Whether to create target dictionary if it doesn’t exist

map()

Map according to mappings in mappings.

Each mapping is defined as a list containing optionally a key for a sub-dictionary as first element, the method to be performed as second element, and the parameters for this method as third element.

An example for a mapping may look like this:

mapping = [['', 'rename_key', ['old', 'new']]]

This would rename the key old in metadata to new.

To do the same for a key in a “sub-dictionary”, you may provide a mapping similar to the following:

mapping = [['test', 'rename_key', ['old', 'new']]]

This would rename the key old in the dictionary test in metadata to new. The same pattern optionally specifying a dictionary to operate on can be applied to all the other mappings detailed below.

Similarly, you can join two items to a new item. In this case, a mapping may look like this:

mapping = [['', 'combine_items', [['key1', 'key2'], 'new']]]

This would join the values corresponding to the two keys key1 and key2 and assign them to the new key new. If you would like to join the values with a particular string, this can be done as well:

mapping = [['', 'combine_items', [['key1', 'key2'], 'new', ' ']]]

Here, the two values will be joined using a space.

Sometimes you want to combine keys, but need one of the two keys several times. Hence, you would like to first copy this key to another one. This can be done in the following way:

mapping = [['', 'copy_key', ['old', 'new']]]

And finally, there are cases where you want to move an item from one dictionary to another. This can be done using the following mapping:

mapping = [['', 'move_item', ['key', 'source', 'target']]]

Here, “source” and “target” are the names of the respective dictionaries the item should be moved between. If the target dictionary does not exist, by default, the method will raise an exception. If, however, you decide to exactly know what you do, you can pass an additional parameter explicitly telling the method to create the target dictionary:

mapping = [['', 'move_item', ['key', 'source', 'target', True]]]

In this particular case, however, you are solely responsible for any typos when specifying the name of the target dictionary, as this will most probably mess up your dictionaries and result in hard to track bugs.

create_mappings()

Create mappings from mapping recipe stored in YAML file.

Mapping recipes are stored in an external file (currently a YAML file whose filename is stored in recipe_filename) in their own format described hereafter. From this file, the recipes are read and converted into mappings in the mappings attribute.

Based on the version number of the format the metadata from an external source are stored in, the correct recipe is selected.

Following is an example of a YAML file containing recipes. Each map can contain several types of mappings and the latter can contain several entries:

---

format:
  type: metadata mapper
  version: '0.1'

map 1:
  metadata file versions:
    - 0.1.6
    - 0.1.5
  combine items:
    - old keys: ['Date start', 'Time start']
      new key: start
      pattern: ' '
      in dict: GENERAL
  rename key:
    - old key: GENERAL
      new key: measurement
      in dict:

map 2:
  metadata file versions:
    - 0.1.4
  copy key:
    - old key: Date
      new key: Date end
      in dict: GENERAL
  move item:
    - key: model
      source dict: measurement
      target dict: spectrometer
    - key: Runs
      source dict: measurement
      target dict: experiment
      create target: True

Unknown mappings are silently ignored. The difference between the two entries in move item is that in the latter case, the target dictionary will be created. Be careful with this option, as typos introduced in your mapping recipe will lead to hard-to-debug behaviour of your application. See move_item() for details.

Important

If you have version numbers with only one dot, you need to explicitly mark this as string in YAML, as otherwise, it will automatically be converted into a float and hence your version lookup will fail.

Generally, the YAML file should be pretty self-explanatory. For details of the different mappings, see the documentation of the respective methods of the class, namely combine_items(), rename_key(), copy_key(), and move_item().

Important

The sequence of operations can sometimes be crucial. They are called as follows: “copy key” -> “combine items” -> “rename key” -> “remove items”

Note that you can name the mappings called here map 1 and map 2 as you like. Use descriptive names wherever possible.

A hint on the filenames for metadata recipe YAML files: Use descriptive names containing the format of the metadata files. For info files, something like infofile_metadata_mappings.yaml may be reasonable.

This method and the underlying ideas are heavily based on concepts and code developed by J. Popp for use within the trEPR Python package.