# aspecd.processing module¶

Data processing functionality.

Key to reproducible science is automatic documentation of each processing step applied to the data of a dataset. Such a processing step each is self-contained, meaning it contains every necessary information to perform the processing task on a given dataset.

Processing steps, in contrast to analysis steps (see aspecd.analysis for details), not only operate on data of a aspecd.dataset.Dataset, but change its data. The information necessary to reproduce each processing step gets added to the aspecd.dataset.Dataset.history attribute of a dataset.

Generally, two types of processing steps can be distinguished:

In the first case, the processing is usually handled using the processing() method of the respective aspecd.dataset.Dataset object. Additionally, those processing steps always only operate on the data of a single dataset. Processing steps handling single datasets should always inherit from the aspecd.processing.SingleProcessingStep class.

In the second case, the processing step is handled using the processing() method of the aspecd.processing.ProcessingStep object, and the datasets are stored as a list within the processing step. As these processing steps span several datasets. Processing steps handling multiple datasets should always inherit from the aspecd.processing.MultiProcessingStep class.

The module contains both, base classes for processing steps (as detailed above) as well as a series of generally applicable processing steps for all kinds of spectroscopic data. The latter are an attempt to relieve the developers of packages derived from the ASpecD framework from the task to reinvent the wheel over and over again.

The next section gives an overview of the concrete processing steps implemented within the ASpecD framework. For details of how to implement your own processing steps, see the section below.

## Concrete processing steps¶

Besides providing the basis for processing steps for the ASpecD framework, ensuring full reproducibility and traceability, hence reproducible science and good scientific practice, this module comes with a (growing) number of general-purpose processing steps useful for basically all kinds of spectroscopic data.

Here is a list as a first overview. For details, see the detailed documentation of each of the classes, readily accessible by the link.

### Processing steps operating on individual datasets¶

The following processing steps operate each on individual datasets independently.

### Processing steps operating on multiple datasets at once¶

The following processing steps operate each on more than one dataset at the same time, requiring at least two datasets as an input to work.

## Writing own processing steps¶

Each real processing step should inherit from either aspecd.processing.SingleProcessingStep in case of operating on a single dataset only or from aspecd.processing.MultiProcessingStep in case of operating on several datasets at once. Furthermore, all processing steps should be contained in one module named “processing”. This allows for easy automation and replay of processing steps, particularly in context of recipe-driven data analysis (for details, see the aspecd.tasks module).

A few hints on writing own processing step classes:

• Always inherit from aspecd.processing.SingleProcessingStep or aspecd.processing.MultiProcessingStep, depending on your needs.

• Store all parameters, implicit and explicit, in the dict parameters of the aspecd.processing.ProcessingStep class, not in separate properties of the class. Only this way, you can ensure full reproducibility and compatibility of recipe-driven data analysis (for details of the latter, see the aspecd.tasks module).

• Always set the description property to a sensible value.

• Always set the undoable property appropriately. In most cases, processing steps can be undone.

• Implement the actual processing in the _perform_task method of the processing step. For sanitising parameters and checking general applicability of the processing step to the dataset(s) at hand, continue reading.

• Make sure to implement the aspecd.processing.ProcessingStep.applicable() method according to your needs. Typical cases would be to check for the dimensionality of the underlying data, as some processing steps may work only for 1D data (or vice versa). Don’t forget to declare this as a static method, using the @staticmethod decorator.

• With the _sanitise_parameters method, the input parameters are automatically checked and an appropriate exception can be thrown in order to describe the error source to the user.

Some more special cases are detailed below. For further advice, consult the source code of this module, and have a look at the concrete processing steps whose purpose is described below in more detail.

### Changing the dimensions of your data¶

If your processing step changes the dimensions of your data, it is your responsibility to ensure the axes values to be consistent with the data. Note that upon changing the dimension of your data, the axes values will be reset to indices along the data dimensions. Hence, you need to first make a (deep) copy of your axes, then change the dimension of your data, and afterwards restore the remaining values from the temporarily stored axes.

### Changing the length of your data¶

When changing the length of the data, always change the corresponding axes values first, and only afterwards the data, as changing the data will change the axes values and adjust their length to the length of the corresponding dimension of the data.

### Adding parameters upon processing¶

Sometimes there is the need to persist values that are only obtained during processing the data. A typical example may be averaging 2D data along one dimension and wanting to store both, range of indices and actual axis units. While in this case, typically the axis value of the centre of the averaging window will be stored as new axis value, the other parameters should end up in the aspecd.processing.ProcessingStep.parameters dictionary. Thus, they are added to the dataset history and available for reports and alike.

## Module documentation¶

class aspecd.processing.ProcessingStep

Base class for processing steps.

Each class actually performing a processing step should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).

Further things that need to be changed upon inheriting from this class are the string stored in description, being basically a one-liner, and the flag undoable if necessary.

When is a processing step undoable?

Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.

One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).

The actual implementation of the processing step is done in the private method _perform_task() that in turn gets called by process() which is called by the aspecd.dataset.Dataset.process() method of the dataset object.

Note

Usually, you will never implement an instance of this class for actual processing tasks, but rather one of the child classes, namely aspecd.processing.SingleProcessingStep and aspecd.processing.MultiProcessingStep, depending on whether your processing step operates on a single dataset or requires multiple datasets.

undoable

Can this processing step be reverted?

Type

bool

name

Name of the analysis step.

Defaults to the lower-case class name, don’t change!

Type

str

parameters

Parameters required for performing the processing step

All parameters, implicit and explicit.

Type

dict

info

Additional information used, e.g., in a report (derived values, …)

Type

dict

description

Short description, to be set in class definition

Type

str

comment

User-supplied comment describing intent, purpose, reason, …

Type

str

references

List of references with relevance for the implementation of the processing step.

Use appropriate record types from the bibrecord package.

Type

list

Raises

Changed in version 0.4: New attribute references

process()

Perform the actual processing step.

The actual processing step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the non-public method _check_applicability(), default parameter values will be set calling the non-public method _set_defaults(), and the parameters will be sanitised by calling the non-public method _sanitise_parameters() prior to calling _perform_task().

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Returns True by default and needs to be implemented in classes inheriting from SingleProcessingStep according to their needs.

This is a static method that gets called automatically by each class inheriting from aspecd.processing.SingleProcessingStep. Hence, if you need to override it in your own class, make the method static as well. An example of an implementation testing for two-dimensional data is given below:

@staticmethod
def applicable(dataset):
return len(dataset.data.axes) == 3

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.SingleProcessingStep

Base class for processing steps operating on single datasets.

Each class actually performing a processing step involving only a single dataset should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).

To perform the processing step, call the process() method of the dataset the processing should be applied to, and provide a reference to the actual processing_step object to it.

Further things that need to be changed upon inheriting from this class are the string stored in description, being basically a one-liner, and the flag undoable if necessary.

When is a processing step undoable?

Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.

One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).

The actual implementation of the processing step is done in the private method _perform_task() that in turn gets called by process() which is called by the aspecd.dataset.Dataset.process() method of the dataset object.

dataset

Dataset the processing step should be performed on

Type

aspecd.dataset.Dataset

Raises
to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

In this particular case, the key “dataset” from the top level of the resulting dictionary will be removed, but not keys with the same name on lower levels of the resulting dict.

Parameters

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type

collections.OrderedDict

process(dataset=None, from_dataset=False)

Perform the actual processing step on the given dataset.

If no dataset is provided at method call, but is set as property in the SingleProcessingStep object, the aspecd.dataset.Dataset.process() method of the dataset will be called and thus the history written.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The aspecd.dataset.Dataset object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the aspecd.processing.SingleProcessingStep object is not necessary.

The actual processing step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the non-public method _check_applicability(), default parameter values will be set calling the non-public method _set_defaults(), and the parameters will be sanitised by calling the non-public method _sanitise_parameters() prior to calling _perform_task().

Parameters
• dataset (aspecd.dataset.Dataset) – dataset to apply processing step to

• from_dataset (boolean) –

whether we are called from within a dataset

Defaults to “False” and shall never be set manually.

Returns

dataset – dataset the processing step has been applied to

Return type

aspecd.dataset.Dataset

Raises
create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.process() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns

history_record – history record for processing step

Return type

aspecd.history.ProcessingHistoryRecord

class aspecd.processing.MultiProcessingStep

Base class for processing steps operating on multiple datasets.

Each class actually performing a processing step involving multiple datasets should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).

To perform the processing step, call the process() method directly. This will take care of writing the history to each individual dataset as well.

Further things that need to be changed upon inheriting from this class are the string stored in description, being basically a one-liner, and the flag undoable if necessary.

When is a processing step undoable?

Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.

One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).

The actual implementation of the processing step is done in the private method _perform_task() that in turn gets called by process() which is called by the aspecd.dataset.Dataset.process() method of the dataset object.

datasets

List of aspecd.dataset.Dataset objects the processing step should act on

Type

list

Raises

New in version 0.2.

process()

Perform the actual processing step.

The actual processing step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the non-public method _check_applicability(), default parameter values will be set calling the non-public method _set_defaults(), and the parameters will be sanitised by calling the non-public method _sanitise_parameters() prior to calling _perform_task().

Raises
create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.process() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns

history_record – history record for processing step

Return type

aspecd.history.ProcessingHistoryRecord

class aspecd.processing.Normalisation

Normalise data.

There are different kinds of normalising data:

• maximum

Data are divided by their maximum value

• minimum

Data are divided by their minimum value

• amplitude

Data are divided by the difference between their maximum and minimum

• area

Data are divided by the sum of their absolute values

You can set these kinds using the attribute parameters["kind"].

Important

Before normalising your data, make sure they have a proper baseline, as otherwise, your normalisation will lead to strange results.

Note

Normalisation can be used for N-D data as well. In this case, the data as a whole are normalised accordingly.

Todo

How to handle noisy data in case of area normalisation, as this would probably account for double the noise if simply taking the absolute?

parameters

All parameters necessary for this step.

kindstr

Kind of normalisation to use

Valid values: “maximum”, “minimum”, “amplitude”, “area”

Note that the first three can be abbreviated, everything containing “max”, “min”, “amp” will be understood respectively.

Defaults to “maximum”

rangelist

Range of the data of the dataset to normalise for.

This can be quite useful if you want to normalise for a specific feature, e.g. an artifact that you’ve recorded separately and want to subtract from the data, or more generally to normalise to certain features of your data irrespective of other parts.

Ranges can be given as indices or in axis units, and for ND datasets, you need to provide as many ranges as dimensions of your data. Units default to indices, but can be specified using the parameter range_unit, see below.

As internally, RangeExtraction is used, see there for more details of how to provide ranges.

range_unitstr

Unit used for the range.

Can be either “index” (default) or “axis”.

noise_rangelist

Data range to use for determining noise level

If provided, the normalisation will account for the noise in case of normalising to minimum, maximum, and amplitude.

In case of ND datasets with N>1, you need to provide as many ranges as dimensions of your data.

Numbers are interpreted by default as percentage.

Default: None

noise_range_unitstr

Unit for specifying noise range

Valid units are “index”, “axis”, “percentage”, with the latter being default. As internally, RangeExtraction gets used, see there for further details.

Default: percentage

Type

dict

Raises

ValueError : – Raised if unknown kind is provided

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In the simplest case, just invoke the normalisation with default values:

- kind: processing
type: Normalisation


This will normalise your data to their maximum.

Sometimes, normalising to maximum is not what you need, hence you can control in more detail the criterion using the appropriate parameter:

- kind: processing
type: Normalisation
properties:
parameters:
kind: amplitude


In this case, you would normalise to the amplitude, meaning setting the difference between minimum and maximum to one. For other kinds, see above.

If you want to normalise not over the entire range of the dataset, but only over a dedicated range, simply provide the necessary parameters:

- kind: processing
type: Normalisation
properties:
parameters:
range: [50, 150]


In this case, we assume a 1D dataset and use indices, requiring the data to span at least over 150 points. Of course, it is often more convenient to provide axis units. Here you go:

- kind: processing
type: Normalisation
properties:
parameters:
range: [340, 350]
range_unit: axis


And in case of ND datasets with N>1, make sure to provide as many ranges as dimensions of your dataset, in case of a 2D dataset:

- kind: processing
type: Normalisation
properties:
parameters:
range:
- [50, 150]
- [30, 40]


Here as well, the range can be given in indices or axis units, but defaults to indices if no unit is explicitly given.

Note

A note for developers: If you inherit from this class and plan to implement further kinds of normalisation, first test for your specific kind of normalisation, and in the else block add a call to super()._perform_task(). This way, you ensure the ValueError will still be raised in case of an unknown kind.

New in version 0.2: Normalising over range of data

New in version 0.2: Accounting for noise for ND data with N>1

Changed in version 0.2: noise_range changed to list from integer

class aspecd.processing.Integration

Integrate data

Currently, the data are integrated using the numpy.cumsum() function. This may change in the future, and you may be able to choose between different algorithms. A potential candidate would be using FFT/IFFT and performing the operation in Fourier space.

Note

N-D arrays can be integrated as well. In this case, np.cumsum() will operate on the last axis.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

As currently, there are no parameters you can set, integrating is as simple as this:

- kind: processing
type: Integration

class aspecd.processing.Differentiation

Differentiate data, i.e., return discrete first derivative

Currently, the data are differentiated using the numpy.gradient() function. This may change in the future, and you may be able to choose between different algorithms. A potential candidate would be using FFT/IFFT and performing the operation in Fourier space.

Note

N-D arrays can be differentiated as well. In this case, differentiation will operate on the last axis.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

As currently, there are no parameters you can set, differentiating is as simple as this:

- kind: processing
type: Differentiation


Changed in version 0.3: Method changed from numpy.diff() to numpy.gradient()

class aspecd.processing.ScalarAlgebra

Perform scalar algebraic operation on one dataset.

To compare datasets (by eye), it might be useful to adapt their intensity by algebraic operations. Adding, subtracting, multiplying and dividing are implemented here.

parameters

All parameters necessary for this step.

kindstr

Kind of scalar algebra to use

Valid values: “plus”, “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, “+”, “-“, “*”, “/”

valuefloat

Parameter of the scalar algebraic operation

Default value: 1.0

Type

dict

Raises

ValueError – Raised if no or wrong kind is provided

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In case you would like to add a fixed value of 42 to your dataset:

- kind: processing
type: ScalarAlgebra
properties:
parameters:
value: 42


Similarly, you could use “minus”, “times”, “by”, “add”, “subtract”, “multiply”, or “divide” as kind - resulting in the given algebraic operation.

class aspecd.processing.Projection

Project data, i.e. reduce dimensions along one axis.

There is many reasons to project along one axis, if nothing else increasing signal-to-noise ratio if multiple scans have been recorded as 2D dataset.

While projection can be considered a special case of averaging as performed by aspecd.processing.Averaging and using the whole range of one axis, averaging is usually performed over part of an axis only. Hence projection is semantically different and therefore implemented as a separate processing step.

parameters

All parameters necessary for this step.

axisint

Axis to average along

Default value: 0

Type

dict

Raises

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In the simplest case, just invoke the projection with default values:

- kind: processing
type: Projection


This will project the data along the first axis (index 0), yielding a 1D dataset.

If you would like to project along the second axis (index 1), simply set the appropriate parameter:

- kind: processing
type: Projection
properties:
parameters:
axis: 1


This will project the data along the second axis (index 1), yielding a 1D dataset.

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Projection is only applicable to datasets with data of at least two dimensions.

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.SliceExtraction

Extract slice along one ore more dimensions from dataset.

With multidimensional datasets, there are use cases where you would like to operate only on a slice along a particular axis. One example may be to compare the first and last trace of a 2D dataset.

Note that “slice” can be anything from 1D to a ND array with at least one dimension less than the original array. If you want to extract a 1D slice from a ND dataset with N>2, you need to provide N-1 values for position and axis. Make sure to always provide as many values for position than you provide for axis.

You can either provide indices or axis values for position. For the latter, set the parameter “unit” accordingly. For details, see below.

parameters

All parameters necessary for this step.

axis :

Index of the axis or list of indices of the axes to take the position from to extract the slice

If you provide a list of axes, you need to provide as many positions as axes.

If an invalid axis is provided, an IndexError is raised.

Default: 0

position :

Position(s) of the slice to extract

Positions can be given as axis indices (default) or axis values, if the parameter “unit” is set accordingly. For details, see below.

If you provide a list of positions, you need to provide as many axes as positions.

If no position is provided or the given position is out of bounds for the given axis, a ValueError is raised.

unitstr

Unit used for specifying the range: either “axis” or “index”.

If an invalid value is provided, a ValueError is raised.

Default: “index”

Type

dict

Raises
• aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions (i.e., 1D dataset).

• ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given. Raised if too many values for axis are given. Raised if number of values for position and axis differ.

• IndexError – Raised if axis is out of bounds for given dataset.

Changed in version 0.2: Parameter “index” renamed to “position” to reflect values to be either indices or axis values

New in version 0.2: Slice positions can be given both, as axis indices and axis values

New in version 0.2: Works for ND datasets with N>1

Changed in version 0.7: Sets dataset label to slice position (in axes units)

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In the simplest case, just invoke the slice extraction with an index only:

- kind: processing
type: SliceExtraction
properties:
parameters:
position: 5


This will extract the sixth slice (index five) along the first axis (index zero).

If you would like to extract a slice along the second axis (with index one), simply provide both parameters, index and axis:

- kind: processing
type: SliceExtraction
properties:
parameters:
position: 5
axis: 1


This will extract the sixth slice along the second axis.

And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to extract a slice from runs from 340 to 350 and you would like to extract the slice corresponding to 343:

- kind: processing
type: SliceExtraction
properties:
parameters:
position: 343
unit: axis


In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.

For ND datasets with N>2, you can either extract a 1D or ND slice, with N always at least one dimension less than the original data. To extract a 2D slice from a 3D dataset, simply proceed as above, providing one value each for position and axis. If, however, you want to extract a 1D slice from a 3D dataset, you need to provide two values each for position and axis:

- kind: processing
type: SliceExtraction
properties:
parameters:
position: [21, 42]
axis: [0, 2]


This particular case would be equivalent to data[21, :, 42] assuming data to contain the numeric data, besides, of course, that the processing step takes care of removing the axes as well.

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Projection is only applicable to datasets with two-dimensional data.

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.RangeExtraction

Extract range of data from dataset.

There are many reasons to look only at a certain range of data of a given dataset. For a ND array, one would use slicing, but for a dataset, one needs to have the axes adjusted as well, hence this processing step.

parameters

All parameters necessary for this step.

rangelist

List of lists with indices for the slicing

For each dimension of the data of the dataset, one list of indices needs to be provided that are used for start, stop [, step] of slice.

unitstr

Unit used for specifying the range: “axis”, “index”, “percentage”.

If an invalid value is provided, a ValueError is raised.

Default: “index”

Type

dict

Raises
• ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given.

• IndexError – Raised if no range is provided. Raised if number of ranges does not fit data dimensions.

New in version 0.2.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In the simplest case, just invoke the range extraction with one range only, assuming a 1D dataset:

- kind: processing
type: RangeExtraction
properties:
parameters:
range: [5, 10]


This will extract the range data[5:10] from your data (and adjust the axis accordingly). In case of 2D data, it would be fairly similar, except of now providing two ranges:

- kind: processing
type: RangeExtraction
properties:
parameters:
range:
- [5, 10]
- [3, 6]


Additionally, you can provide step sizes, just as you can do when slicing in Python:

- kind: processing
type: RangeExtraction
properties:
parameters:
range: [5, 10, 2]


This is equivalent to data[5:10:2] or data[(slice(5, 10, 2))], accordingly.

Sometimes, it is more convenient to give ranges in axis values rather than indices. This can be achieved by setting the parameter unit to “axis”:

- kind: processing
type: RangeExtraction
properties:
parameters:
range: [5, 10]
unit: axis


Note that in this case, setting a step is meaningless and will be silently ignored. Furthermore, the nearest axis values will be used for the range.

In some cases you may want to extract a range by providing percentages instead of indices or axis values. Even this can be done:

- kind: processing
type: RangeExtraction
properties:
parameters:
range: [0, 10]
unit: percentage


Here, the first ten percent of the data of the 1D dataset will be extracted, or more exactly the indices falling within the first ten percent. Note that in this case, setting a step is meaningless and will be silently ignored. Furthermore, the nearest axis values will be used for the range.

class aspecd.processing.BaselineCorrection

Subtract baseline from dataset.

Currently, only polynomial baseline corrections are supported.

The coefficients used to calculate the baseline will be written to the parameters dictionary upon processing.

If no order is explicitly given, a polynomial baseline of zeroth order will be used.

Important

Baseline correction works only for 1D and 2D datasets, not for higher-dimensional datasets.

parameters

All parameters necessary for this step.

kindstr

The kind of baseline correction to be performed.

Default: polynomial

orderint

The order for the baseline correction if no coefficients are given.

Default: 0

fit_area :

Parts of the spectrum to be considered as baseline, can be given as list or single number. If one number is given, it takes that percentage from both sides, respectively, i.e. 10 means 10% left and 10% right. If a list of two numbers is provided, the corresponding percentages are taken from each side of the spectrum, i.e. [5, 20] takes 5% from the left side and 20% from the right.

Default: [10, 10]

coefficients:

Coefficients used to calculate the baseline.

axisint

Axis along which to perform the baseline correction.

Only necessary in case of 2D data.

Default: 0

Type

dict

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In the simplest case, just invoke the baseline correction with default values:

- kind: processing
type: BaselineCorrection


In this case, a zeroth-order polynomial baseline will be subtracted from your dataset using ten percent to the left and right, and in case of a 2D dataset, the baseline correction will be performed along the first axis (index zero) for all indices of the second axis (index 1).

Of course, often you want to control a little bit more how the baseline will be corrected. This can be done by explicitly setting some parameters.

Suppose you want to perform a baseline correction with a polynomial of first order:

- kind: processing
type: BaselineCorrection
properties:
parameters:
order: 1


If you want to change the (percental) area used for fitting the baseline, and even specify different ranges left and right:

- kind: processing
type: BaselineCorrection
properties:
parameters:
fit_area: [5, 20]


Here, five percent from the left and 20 percent from the right are used.

Finally, suppose you have a 2D dataset and want to average along the second axis (index one):

- kind: processing
type: BaselineCorrection
properties:
parameters:
axis: 1


Of course, you can combine the different options.

Changed in version 0.3: Coefficients are returned in unscaled data domain

Changed in version 0.6.3: Zero values in range properly handled

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Baseline correction is (currently) only applicable to datasets with one- and two-dimensional data.

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.Averaging

Average data over given range along given axis.

While projection as performed by aspecd.processing.Projection can be considered a special case of averaging using the whole range of one axis, averaging is usually performed over part of an axis only.

Note

Currently, averaging works only for 2D datasets, not for higher-dimensional datasets. This may, however, change in the future.

Important

Indices for the range work slightly different than in Python: While still zero-based, a range of [2, 3] will result in having the third and fourth column/row averaged. This seems more intuitive to the average scientist than sticking with Python (and having in this case the third column/row returned).

You can use negative indices as well, as long as the resulting indices are still within the range of the corresponding data dimension.

parameters

All parameters necessary for this step.

axisint

The axis to average along.

Default: 0

rangelist

The range (start, end) to average over.

Default: []

unitstr

Unit used for specifying the range: either “axis” or “index”.

Default: “index”

Type

dict

Raises
• ValueError – Raised if range is out of range for given axis or empty Raised if unit is not either “axis” or “index”

• IndexError – Raised if axis is out of bounds for given dataset

Changed in version 0.7: Sets dataset label to averaging range (in axes units)

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In the simplest case, just invoke the averaging with a range only:

- kind: processing
type: Averaging
properties:
parameters:
range: [2, 3]


In this case, you will get your dataset averaged along the first axis (index zero), and averaged over the indices 2 and 3 of the second axis.

If you would like to average over the second axis (index 1), just specify this axis:

- kind: processing
type: Averaging
properties:
parameters:
range: [2, 3]
axis: 1


And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to average over runs from 340 to 350 and you would like to average from 342 to 344:

- kind: processing
type: Averaging
properties:
parameters:
range: [342, 344]
unit: axis


In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Averaging is only applicable to datasets with two-dimensional data.

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.ScalarAxisAlgebra

Perform scalar algebraic operation on the axis of a dataset.

Sometimes, changing the values of an axis can be quite useful, for example to apply corrections obtained by some analysis step. Usually, this requires scalar algebraic operations on the axis values.

parameters

All parameters necessary for this step.

kindstr

Kind of scalar algebra to use

Valid values: “plus”, “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, “+”, “-“, “*”, “/”, “power”, “pow”, “**”

axisint

Axis to operate on

Default value: 0

valuefloat

Parameter of the scalar algebraic operation

Default value: None

Type

dict

Raises

ValueError – Raised if no or wrong kind is provided

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In case you would like to add a fixed value of 42 to the first axis (index 0) your dataset:

- kind: processing
type: ScalarAxisAlgebra
properties:
parameters:
kind: plus
axis: 0
value: 42


Similarly, you could use “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, and “power” as kind - resulting in the given algebraic operation.

New in version 0.2.

class aspecd.processing.DatasetAlgebra

Perform scalar algebraic operation on two datasets.

To improve the signal-to-noise ratio, adding the data of two datasets can sometimes be useful. Alternatively, adding or subtracting the data of two datasets can be used to help interpreting the signals.

Important

The data of the two datasets to perform the scalar algebraic operation on need to have the same dimension (that is checked for), and to obtain meaningful results, usually the axes values need to be identical as well. For this purpose, use the CommonRangeExtraction processing step.

parameters

All parameters necessary for this step.

kindstr

Kind of scalar algebra to use

Valid values: “plus”, “minus”, “add”, “subtract”, “+”, “-“

Note that in contrast to scalar algebra, multiply and divide are not implemented for operation on two datasets.

datasetaspecd.dataset.Dataset

Dataset whose data to add or subtract

Type

dict

Raises

ValueError – Raised if no or wrong kind is provided Raised if data of datasets have different shapes

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In case you would like to add the data of the dataset referred to by its label label_to_other_dataset to your dataset:

- kind: processing
type: DatasetAlgebra
properties:
parameters:
kind: plus
dataset: label_to_other_dataset


Similarly, you could use “minus”, “add”, “subtract” as kind - resulting in the given algebraic operation.

As mentioned already, the data of both datasets need to have identical shape, and comparison is only meaningful if the axes are compatible as well. Hence, you will usually want to perform a CommonRangeExtraction processing step before doing algebra with two datasets:

- kind: multiprocessing
type: CommonRangeExtraction
result:
- label_to_dataset
- label_to_other_dataset

- kind: processing
type: DatasetAlgebra
properties:
parameters:
kind: plus
dataset: label_to_other_dataset
apply_to:
- label_to_dataset


New in version 0.2.

class aspecd.processing.Interpolation

Interpolate data.

As soon as data of different datasets should be arithmetically combined, they need to have an identical grid. Often, this can only be achieved by interpolating one or both datasets.

Take care not to use interpolation to artificially smooth your data.

For an in-depth discussion of interpolating ND data, see the following discussions on Stack Overflow, particularly the answers by Joe Kington providing both, theoretical insight and Python code:

Important

Currently, interpolation works only for 1D and 2D datasets, not for higher-dimensional datasets. This may, however, change in the future.

Todo

• Make type of interpolation controllable

• Check for ways to make it work with ND, N>2

parameters

All parameters necessary for this step.

rangelist

Range of the axis to interpolate for

Needs to be a list of lists in case of ND datasets with N>1, containing N two-element vectors as ranges for each of the axes.

npointslist

Number of points to interpolate for

Needs to be a list in case of ND datasets with N>1, containing N elements, one for each of the axes.

unitstr

Unit the ranges are given in

Can be either “index” (default) or “axis”.

Type

dict

Raises
• ValueError – Raised if no range to interpolate for is provided. Raised if no number of points to interpolate for is provided. Raised if unit is unknown.

• IndexError – Raised if list of ranges does not fit data dimensions. Raised if list of npoints does not fit data dimensions. Raised if given range is out of range of data/axes

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Generally, interpolating requires to provide both, a range and a number of points:

- kind: processing
type: Interpolation
properties:
parameters:
range: [10, 100]
npoints: 901


This would interpolate your data between their indices 10 and 100 using 901 points. As it is sometimes (often) more convenient to work with axis units, you can tell the processing step to use axis values instead of indices:

- kind: processing
type: Interpolation
properties:
parameters:
range: [340, 350]
npoints: 1001
unit: axis


This would interpolate your (1D) data between the axis values 340 and 350 using 1001 points.

New in version 0.2.

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Interpolation is currently only applicable to datasets with one- and two-dimensional data.

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.Filtering

Filter data.

Generally, filtering is a large field of (digital) signal processing, and currently, this class only implements a very small subset of filters often applied in spectroscopy, namely low-pass filters that can be used for smoothing (“denoising”) data.

Filtering makes heavy use of the scipy.ndimage and scipy.signal modules of the SciPy package. For details, see there.

Filtering works with data with arbitrary dimensions, in this case applying the filter in each dimension.

parameters

All parameters necessary for this step.

typestr

Type of the filter to use

Currently, three types are supported: “uniform”, “gaussian”, “savitzky-golay”. For convenience, a list of aliases exists for each of these types, and if you use one of these aliases, it will be replaced by its generic name:

Generic

Alias

‘uniform’

‘box’, ‘boxcar’, ‘moving-average’, ‘car’

‘gaussian’

‘binom’, ‘binomial’

‘savitzky-golay’

‘savitzky_golay’, ‘savitzky golay’, ‘savgol’, ‘savitzky’

window_lengthint

Length of the filter window

The window needs to be smaller than the actual data. If you provide a window length that exceeds the data range, an exception will be raised.

orderint

Polynomial order for the Savitzky-Golay filter

Only necessary for this type of filter. If no order is given for this filter, an exception will be raised.

Type

dict

Raises

ValueError – Raised if no or wrong filter type is provided. Raised if no filter window is provided. Raised if filter window exceeds data range. Raised in case of Savitzky-Golay filter when no order is provided.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Generally, filtering requires to provide both, a type of filter and a window length. Therefore, for uniform and Gaussian filters, this would be:

- kind: processing
type: Filtering
properties:
parameters:
type: uniform
window_length: 10


Of course, at least uniform filtering (also known as boxcar or moving average) is strongly discouraged due to the artifacts introduced. Probably the best bet for applying a filter to smooth your data is the Savitzky-Golay filter:

- kind: processing
type: Filtering
properties:
parameters:
type: savitzky-golay
window_length: 9
order: 3


Note that for this filter, you need to provide the polynomial order as well. To get best results, you will need to experiment with the parameters a bit.

New in version 0.2.

class aspecd.processing.CommonRangeExtraction

Extract the common range of data for multiple datasets using interpolation.

One prerequisite for adding up multiple datasets in a meaningful way is to have their data dimensions as well as their respective axes values agree. This usually requires interpolating the data to a common set of axes.

Important

Currently, extracting the common range works only for 1D and 2D datasets, not for higher-dimensional datasets, due to the underlying method of interpolation. See Interpolation for details. This may, however, change in the future.

Todo

• Make type of interpolation controllable

• Make number of points controllable (in absolute numbers as well as minimum and maximum points with respect to datasets)

parameters

All parameters necessary for this step.

ignore_unitsbool

Whether to ignore the axes units when checking the datasets for applicability.

Usually, the axes units should be identical, but sometimes, they may be named differently or be compatible anyways. Use with care and only in case you exactly know what you do

Default: False

common_rangelist

Common range of values for each axis as determined by the processing step.

For >1D datasets, this will be a list of lists.

npointslist

Number of points used for the final grid the data are interpolated on.

The length is identical to the dimensions of the data of the datasets.

Type

dict

Raises
• ValueError – Raised if datasets have axes with different units or disjoint values Raised if datasets have different dimensions

• IndexError – Raised if axis is out of bounds for given dataset

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In case you would like to bring all datasets currently loaded into your recipe to a common range (use with caution, however), things can be as simple as:

- kind: multiprocessing
type: CommonRangeExtraction


Note that this will operate on all datasets currently available in your recipe, including results from other processing steps. Therefore, it is usually better to be explicit, using apply_to. Otherwise, you can use this processing step early on in your recipe.

Usually, however, you will want to restrict this to a subset using apply_to and provide labels for the results:

- kind: multiprocessing
type: CommonRangeExtraction
result:
- dataset1_cut
- dataset2_cut
apply_tp:
- dataset1
- dataset2


If you want to perform algebraic operations on datasets, the data of both datasets need to have identical shape, and comparison is only meaningful if the axes are compatible as well. Hence, you will usually want to perform a CommonRangeExtraction processing step before doing algebra with two datasets:

- kind: multiprocessing
type: CommonRangeExtraction
result:
- label_to_dataset
- label_to_other_dataset

- kind: processing
type: DatasetAlgebra
properties:
parameters:
kind: plus
dataset: label_to_other_dataset
apply_to:
- label_to_dataset


For details of the algebraic operations on datasets, see DatasetAlgebra.

New in version 0.2.

Changed in version 0.6.3: Unit of last axis (i.e., intensity) gets ignored when checking for same units

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Extracting a common range is currently only applicable to datasets with one- and two-dimensional data, due to the underlying interpolation.

Parameters

dataset (aspecd.dataset.Dataset) – dataset to check

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.Noise

Add (coloured) noise to data.

Particularly for testing algorithms and hence creating test data, adding noise to these test data is crucial. Furthermore, the naive approach of adding white (Gaussian, normally distributed) noise often does not reflect the physical reality, as “real” noise often has a different power spectral density (PSD).

Probably the kind of noise most often encountered in spectroscopy is 1/f noise or pink noise, with the PSD decreasing by 3 dB per octave or 10 dB per decade. For more details on the different kinds of noise, the following sources may be a good starting point:

Different strategies exist to create coloured noise, and the implementation used here follows basically the ideas published by Timmer and König:

• J. Timmer and M. König: On generating power law noise. Astronomy and Astrophysics 300, 707–710 (1995)

In short: In the Fourier space, normally distributed random numbers are drawn for power and phase of each frequency component, and the power scaled by the appropriate power law. Afterwards, the resulting frequency spectrum is back transformed using iFFT and ensuring real data.

Further inspiration came from the following two sources:

Note: The first is based on a MATLAB(R) code by Max Little and contains a number of errors in its Python translation that are not present in the original code.

The added noise has always a mean of (close to) zero.

parameters

All parameters necessary for this step.

exponentfloat

The exponent used for scaling the power of the frequency components

0 – white (Gaussian) noise -1 – pink (1/f) noise -2 – Brownian (1/f**2) noise

Default: -1 (pink noise)

normalisebool

Whether to normalise the noise amplitude prior to adding to the data.

In this case, the amplitude is normalised to 1.

amplitudefloat

Amplitude of the noise

This is often useful to explicitly control the noise level and removes the need to first normalise and scale the data noise should be added to.

Type

dict

Note

The exponent for the noise is not restricted to integer values, nor to negative values. While for spectroscopic data, pink (1/f) noise usually prevails (exponent = -1), the opposite effect with high frequencies dominating can occur as well. A prominent example of naturally occurring “blue noise” with the density proportional to f is the Cherenkov radiation.

Note

In case of ND data, the coloured noise is calculated along the first dimension only, all other dimensions will exhibit (close to) white (Gaussian) noise. Generally, this should not be a problem in spectroscopy, as usually, data are recorded over time in one dimension only, and only in this (implicit) time dimension coloured noise will be relevant.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Generally, adding noise to a dataset can be quite simple. Without explicitly providing any parameter, 1/f or pink noise will be added to the data:

- kind: processing
type: Noise


Of course, you can control in much more detail the kind of noise and its amplitude. To add Gaussian (white) noise to a dataset:

- kind: processing
type: Noise
properties:
parameters:
exponent: 0


Similarly, you could add Brownian (1/f**2) noise (with an exponent of -2), but you can give positive exponents as well. While this type of noise is less relevant in spectroscopy, it is relevant in other areas.

To control the noise amplitude, there are two different strategies: normalising the amplitude to one, and providing an explicit amplitude. Normalising works as follows:

- kind: processing
type: Noise
properties:
parameters:
normalise: true


Providing an explicit amplitude can be quite helpful in case you want to control the signal-to-noise ratio and know the amplitude of your signal prior to adding noise. Adding noise with a noise amplitude of 0.01 would be done as follows:

- kind: processing
type: Noise
properties:
parameters:
amplitude: 0.01


Note that in case you do not provide an exponent, its default value will be used, resulting in pink (1/f) noise, as this is spectroscopically the most relevant.

New in version 0.3.

Changed in version 0.4: Added reference to references

Changed in version 0.6: Added parameter amplitude

class aspecd.processing.ChangeAxesValues

Change values of individual axes.

What sounds pretty much like data manipulation is sometimes a necessity due to the shortcoming of vendor file formats. Let’s face it, but sometimes values read from raw data simply are wrong, due to wrong readout or wrong processing of these parameters within the device. Therefore, it seems much better to transparently change the respective axis values rather than having to modify raw data by hand. Using a processing step has two crucial advantages: (i) it allows for full reproducibility and traceability, and (ii) it can be done in context of recipe-driven data analysis, i.e. not requiring any programming skills.

Note

A real-world example: angular-dependent measurements recorded wrong angles in the raw data file, while the actual positions were correct. Assuming measurements from 0° to 180° in 10° steps, it is pretty straight-forward how to fix this problem: Assign equidistant values from 0° to 180° and use the information about the actual axis length.

parameters

All parameters necessary for this step.

rangelist

The range of the axis, i.e. start and end value

axeslist

The axes to set the new values for

Can be an integer in case of a single axis, otherwise a list of integers. If omitted, all axes with values will be assumed (i.e., one per data dimension).

Type

dict

Raises

IndexError – Raised if index is out of range for axes or given number of axes and ranges is incompatible

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

In case you would like to change the axis range of a 1D dataset, things are as simple as:

- kind: singleprocessing
type: ChangeAxesValues
properties:
parameters:
range: [35, 42]


This would take the first axis (index 0) and set the range to linearly spaced data ranging from 35 to 42, of course with the same number of values as before.

If you would want to change both axes in a 2D dataset, same here:

- kind: singleprocessing
type: ChangeAxesValues
properties:
parameters:
range:
- [35, 42]
- [17.5, 21]


This would set the range of the first axis (index 0) to the interval [35, 42], and the range of the second axis (index 1) to the interval [17.5, 21].

More often, you may have a 2D dataset where you intend to change the values of only one axis. Suppose the example from above with angular-dependent measurements and the angles in the second dimension:

- kind: singleprocessing
type: ChangeAxesValues
properties:
parameters:
range: [0, 180]
axes: 1


Here, the second axis (index 1) will be set accordingly.

New in version 0.3.