You're reading an old version of this documentation. For uptodate information, please have a look at v0.3.
aspecd.processing module¶
Data processing functionality.
Key to reproducible science is automatic documentation of each processing step applied to the data of a dataset. Such a processing step each is selfcontained, meaning it contains every necessary information to perform the processing task on a given dataset.
Processing steps, in contrast to analysis steps (see aspecd.analysis
for details), not only operate on data of a aspecd.dataset.Dataset
,
but change its data. The information necessary to reproduce each processing
step gets added to the aspecd.dataset.Dataset.history
attribute of a
dataset.
Generally, two types of processing steps can be distinguished:
Processing steps for handling single datasets
Shall be derived from
aspecd.processing.SingleProcessingStep
.Processing steps for handling multiple datasets
Shall be derived from
aspecd.processing.MultiProcessingStep
.
In the first case, the processing is usually handled using the
processing()
method of the respective aspecd.dataset.Dataset
object. Additionally, those processing steps always only operate on the data
of a single dataset. Processing steps handling single datasets should always
inherit from the aspecd.processing.SingleProcessingStep
class.
In the second case, the processing step is handled using the processing()
method of the aspecd.processing.ProcessingStep
object, and the datasets
are stored as a list within the processing step. As these processing steps span
several datasets. Processing steps handling multiple datasets should
always inherit from the aspecd.processing.MultiProcessingStep
class.
The module contains both, base classes for processing steps (as detailed above) as well as a series of generally applicable processing steps for all kinds of spectroscopic data. The latter are an attempt to relieve the developers of packages derived from the ASpecD framework from the task to reinvent the wheel over and over again.
The next section gives an overview of the concrete processing steps implemented within the ASpecD framework. For details of how to implement your own processing steps, see the section below.
Concrete processing steps¶
Besides providing the basis for processing steps for the ASpecD framework, ensuring full reproducibility and traceability, hence reproducible science and good scientific practice, this module comes with a (growing) number of generalpurpose processing steps useful for basically all kinds of spectroscopic data.
Here is a list as a first overview. For details, see the detailed documentation of each of the classes, readily accessible by the link.
Processing steps operating on individual datasets¶
The following processing steps operate each on individual datasets independently.
aspecd.processing.Normalisation
Normalise data.
There are different kinds of normalising data: maximum, minimum, amplitude, area

Integrate data
aspecd.processing.Differentiation
Differentiate data, i.e., return discrete first derivative
aspecd.processing.ScalarAlgebra
Perform scalar algebraic operation on one dataset.
Operations available: add, subtract, multiply, divide (by given scalar)
aspecd.processing.ScalarAxisAlgebra
Perform scalar algebraic operation on axis values of a dataset.
Operations available: add, subtract, multiply, divide, power (by given scalar)
aspecd.processing.DatasetAlgebra
Perform scalar algebraic operation on two datasets.
Operations available: add, subtract

Project data, i.e. reduce dimensions along one axis.
aspecd.processing.SliceExtraction
Extract slice along one ore more dimensions from dataset.
aspecd.processing.RangeExtraction
Extract range of data from a dataset.
aspecd.processing.BaselineCorrection
Correct baseline of dataset.

Average data over given range along given axis.
aspecd.processing.Interpolation
Interpolate data.

Filter data.

Add (coloured) noise to data.
Processing steps operating on multiple datasets at once¶
The following processing steps operate each on more than one dataset at the same time, requiring at least two datasets as an input to work.
aspecd.processing.CommonRangeExtraction
Extract the common range of data for multiple datasets using interpolation.
Useful (and often necessary) for performing algebraic operations on datasets.
Writing own processing steps¶
Each real processing step should inherit from either
aspecd.processing.SingleProcessingStep
in case of operating on a
single dataset only or from aspecd.processing.MultiProcessingStep
in
case of operating on several datasets at once. Furthermore, all processing
steps should be contained in one module named “processing”. This allows for
easy automation and replay of processing steps, particularly in context of
recipedriven data analysis (for details, see the aspecd.tasks
module).
General advice¶
A few hints on writing own processing step classes:
Always inherit from
aspecd.processing.SingleProcessingStep
oraspecd.processing.MultiProcessingStep
, depending on your needs.Store all parameters, implicit and explicit, in the dict
parameters
of theaspecd.processing.ProcessingStep
class, not in separate properties of the class. Only this way, you can ensure full reproducibility and compatibility of recipedriven data analysis (for details of the latter, see theaspecd.tasks
module).Always set the
description
property to a sensible value.Always set the
undoable
property appropriately. In most cases, processing steps can be undone.Implement the actual processing in the
_perform_task
method of the processing step. For sanitising parameters and checking general applicability of the processing step to the dataset(s) at hand, continue reading.Make sure to implement the
aspecd.processing.ProcessingStep.applicable()
method according to your needs. Typical cases would be to check for the dimensionality of the underlying data, as some processing steps may work only for 1D data (or vice versa). Don’t forget to declare this as a static method, using the@staticmethod
decorator.With the
_sanitise_parameters
method, the input parameters are automatically checked and an appropriate exception can be thrown in order to describe the error source to the user.
Some more special cases are detailed below. For further advice, consult the source code of this module, and have a look at the concrete processing steps whose purpose is described below in more detail.
Changing the dimensions of your data¶
If your processing step changes the dimensions of your data, it is your responsibility to ensure the axes values to be consistent with the data. Note that upon changing the dimension of your data, the axes values will be reset to indices along the data dimensions. Hence, you need to first make a (deep) copy of your axes, then change the dimension of your data, and afterwards restore the remaining values from the temporarily stored axes.
Changing the length of your data¶
When changing the length of the data, always change the corresponding axes values first, and only afterwards the data, as changing the data will change the axes values and adjust their length to the length of the corresponding dimension of the data.
Adding parameters upon processing¶
Sometimes there is the need to persist values that are only obtained during
processing the data. A typical example may be averaging 2D data along one
dimension and wanting to store both, range of indices and actual axis units.
While in this case, typically the axis value of the centre of the averaging
window will be stored as new axis value, the other parameters should end up
in the aspecd.processing.ProcessingStep.parameters
dictionary. Thus,
they are added to the dataset history and available for reports and alike.
Module documentation¶

class
aspecd.processing.
ProcessingStep
¶ Bases:
object
Base class for processing steps.
Each class actually performing a processing step should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).
Further things that need to be changed upon inheriting from this class are the string stored in
description
, being basically a oneliner, and the flagundoable
if necessary.When is a processing step undoable?
Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.
One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).
The actual implementation of the processing step is done in the private method
_perform_task()
that in turn gets called byprocess()
which is called by theaspecd.dataset.Dataset.process()
method of the dataset object.Note
Usually, you will never implement an instance of this class for actual processing tasks, but rather one of the child classes, namely
aspecd.processing.SingleProcessingStep
andaspecd.processing.MultiProcessingStep
, depending on whether your processing step operates on a single dataset or requires multiple datasets.
parameters
¶ Parameters required for performing the processing step
All parameters, implicit and explicit.
 Type
 Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

process
()¶ Perform the actual processing step.
The actual processing step should be implemented within the nonpublic method
_perform_task()
. Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the nonpublic method_check_applicability()
, default parameter values will be set calling the nonpublic method_set_defaults()
, and the parameters will be sanitised by calling the nonpublic method_sanitise_parameters()
prior to calling_perform_task()
.

static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Returns True by default and needs to be implemented in classes inheriting from SingleProcessingStep according to their needs.
This is a static method that gets called automatically by each class inheriting from
aspecd.processing.SingleProcessingStep
. Hence, if you need to override it in your own class, make the method static as well. An example of an implementation testing for twodimensional data is given below:@staticmethod def applicable(dataset): return len(dataset.data.axes) == 3
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type


class
aspecd.processing.
SingleProcessingStep
¶ Bases:
aspecd.processing.ProcessingStep
Base class for processing steps operating on single datasets.
Each class actually performing a processing step involving only a single dataset should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).
To perform the processing step, call the
process()
method of the dataset the processing should be applied to, and provide a reference to the actual processing_step object to it.Further things that need to be changed upon inheriting from this class are the string stored in
description
, being basically a oneliner, and the flagundoable
if necessary.When is a processing step undoable?
Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.
One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).
The actual implementation of the processing step is done in the private method
_perform_task()
that in turn gets called byprocess()
which is called by theaspecd.dataset.Dataset.process()
method of the dataset object.
dataset
¶ Dataset the processing step should be performed on
 Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

process
(dataset=None, from_dataset=False)¶ Perform the actual processing step on the given dataset.
If no dataset is provided at method call, but is set as property in the SingleProcessingStep object, the
aspecd.dataset.Dataset.process()
method of the dataset will be called and thus the history written.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The
aspecd.dataset.Dataset
object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within theaspecd.processing.SingleProcessingStep
object is not necessary.The actual processing step should be implemented within the nonpublic method
_perform_task()
. Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the nonpublic method_check_applicability()
, default parameter values will be set calling the nonpublic method_set_defaults()
, and the parameters will be sanitised by calling the nonpublic method_sanitise_parameters()
prior to calling_perform_task()
. Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to apply processing step tofrom_dataset (boolean) –
whether we are called from within a dataset
Defaults to “False” and shall never be set manually.
 Returns
dataset – dataset the processing step has been applied to
 Return type
 Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

create_history_record
()¶ Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.process()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly. Returns
history_record – history record for processing step
 Return type


class
aspecd.processing.
MultiProcessingStep
¶ Bases:
aspecd.processing.ProcessingStep
Base class for processing steps operating on multiple datasets.
Each class actually performing a processing step involving multiple datasets should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).
To perform the processing step, call the
process()
method directly. This will take care of writing the history to each individual dataset as well.Further things that need to be changed upon inheriting from this class are the string stored in
description
, being basically a oneliner, and the flagundoable
if necessary.When is a processing step undoable?
Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.
One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).
The actual implementation of the processing step is done in the private method
_perform_task()
that in turn gets called byprocess()
which is called by theaspecd.dataset.Dataset.process()
method of the dataset object.
datasets
¶ List of
aspecd.dataset.Dataset
objects the processing step should act on Type
 Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
New in version 0.2.

process
()¶ Perform the actual processing step.
The actual processing step should be implemented within the nonpublic method
_perform_task()
. Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the nonpublic method_check_applicability()
, default parameter values will be set calling the nonpublic method_set_defaults()
, and the parameters will be sanitised by calling the nonpublic method_sanitise_parameters()
prior to calling_perform_task()
. Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

create_history_record
()¶ Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.process()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly. Returns
history_record – history record for processing step
 Return type


class
aspecd.processing.
Normalisation
¶ Bases:
aspecd.processing.SingleProcessingStep
Normalise data.
There are different kinds of normalising data:
maximum
Data are divided by their maximum value
minimum
Data are divided by their minimum value
amplitude
Data are divided by the difference between their maximum and minimum
area
Data are divided by the sum of their absolute values
You can set these kinds using the attribute
parameters["kind"]
.Important
Before normalising your data, make sure they have a proper baseline, as otherwise, your normalisation will lead to strange results.
Note
Normalisation can be used for ND data as well. In this case, the data as a whole are normalised accordingly.
Todo
How to handle noisy data in case of area normalisation, as this would probably account for double the noise if simply taking the absolute?

parameters
¶ All parameters necessary for this step.
 kind
str
Kind of normalisation to use
Valid values: “maximum”, “minimum”, “amplitude”, “area”
Note that the first three can be abbreviated, everything containing “max”, “min”, “amp” will be understood respectively.
Defaults to “maximum”
 range
list
Range of the data of the dataset to normalise for.
This can be quite useful if you want to normalise for a specific feature, e.g. an artifact that you’ve recorded separately and want to subtract from the data, or more generally to normalise to certain features of your data irrespective of other parts.
Ranges can be given as indices or in axis units, and for ND datasets, you need to provide as many ranges as dimensions of your data. Units default to indices, but can be specified using the parameter
range_unit
, see below.As internally,
RangeExtraction
is used, see there for more details of how to provide ranges. range_unit
str
Unit used for the range.
Can be either “index” (default) or “axis”.
 noise_range
list
Data range to use for determining noise level
If provided, the normalisation will account for the noise in case of normalising to minimum, maximum, and amplitude.
In case of ND datasets with N>1, you need to provide as many ranges as dimensions of your data.
Numbers are interpreted by default as percentage.
Default: None
 noise_range_unit
str
Unit for specifying noise range
Valid units are “index”, “axis”, “percentage”, with the latter being default. As internally,
RangeExtraction
gets used, see there for further details.Default: percentage
 Type
 kind
 Raises
ValueError : – Raised if unknown kind is provided
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the normalisation with default values:
 kind: processing type: Normalisation
This will normalise your data to their maximum.
Sometimes, normalising to maximum is not what you need, hence you can control in more detail the criterion using the appropriate parameter:
 kind: processing type: Normalisation properties: parameters: kind: amplitude
In this case, you would normalise to the amplitude, meaning setting the difference between minimum and maximum to one. For other kinds, see above.
If you want to normalise not over the entire range of the dataset, but only over a dedicated range, simply provide the necessary parameters:
 kind: processing type: Normalisation properties: parameters: range: [50, 150]
In this case, we assume a 1D dataset and use indices, requiring the data to span at least over 150 points. Of course, it is often more convenient to provide axis units. Here you go:
 kind: processing type: Normalisation properties: parameters: range: [340, 350] range_unit: axis
And in case of ND datasets with N>1, make sure to provide as many ranges as dimensions of your dataset, in case of a 2D dataset:
 kind: processing type: Normalisation properties: parameters: range:  [50, 150]  [30, 40]
Here as well, the range can be given in indices or axis units, but defaults to indices if no unit is explicitly given.
Note
A note for developers: If you inherit from this class and plan to implement further kinds of normalisation, first test for your specific kind of normalisation, and in the
else
block add a call tosuper()._perform_task()
. This way, you ensure theValueError
will still be raised in case of an unknown kind.New in version 0.2: Normalising over range of data
New in version 0.2: Accounting for noise for ND data with N>1
Changed in version 0.2: noise_range changed to list from integer

class
aspecd.processing.
Integration
¶ Bases:
aspecd.processing.SingleProcessingStep
Integrate data
Currently, the data are integrated using the
numpy.cumsum()
function. This may change in the future, and you may be able to choose between different algorithms. A potential candidate would be using FFT/IFFT and performing the operation in Fourier space.Note
ND arrays can be integrated as well. In this case,
np.cumsum()
will operate on the last axis.Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.As currently, there are no parameters you can set, integrating is as simple as this:
 kind: processing type: Integration

class
aspecd.processing.
Differentiation
¶ Bases:
aspecd.processing.SingleProcessingStep
Differentiate data, i.e., return discrete first derivative
Currently, the data are differentiated using the
numpy.gradient()
function. This may change in the future, and you may be able to choose between different algorithms. A potential candidate would be using FFT/IFFT and performing the operation in Fourier space.Note
ND arrays can be differentiated as well. In this case, differentiation will operate on the last axis.
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.As currently, there are no parameters you can set, differentiating is as simple as this:
 kind: processing type: Differentiation
Changed in version 0.3: Method changed from
numpy.diff()
tonumpy.gradient()

class
aspecd.processing.
ScalarAlgebra
¶ Bases:
aspecd.processing.SingleProcessingStep
Perform scalar algebraic operation on one dataset.
To compare datasets (by eye), it might be useful to adapt their intensity by algebraic operations. Adding, subtracting, multiplying and dividing are implemented here.

parameters
¶ All parameters necessary for this step.
 kind
str
Kind of scalar algebra to use
Valid values: “plus”, “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, “+”, ““, “*”, “/”
 value
float
Parameter of the scalar algebraic operation
Default value: 1.0
 Type
 kind
 Raises
ValueError – Raised if no or wrong kind is provided
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to add a fixed value of 42 to your dataset:
 kind: processing type: ScalarAlgebra properties: parameters: kind: add value: 42
Similarly, you could use “minus”, “times”, “by”, “add”, “subtract”, “multiply”, or “divide” as kind  resulting in the given algebraic operation.


class
aspecd.processing.
Projection
¶ Bases:
aspecd.processing.SingleProcessingStep
Project data, i.e. reduce dimensions along one axis.
There is many reasons to project along one axis, if nothing else increasing signaltonoise ratio if multiple scans have been recorded as 2D dataset.
While projection can be considered a special case of averaging as performed by
aspecd.processing.Averaging
and using the whole range of one axis, averaging is usually performed over part of an axis only. Hence projection is semantically different and therefore implemented as a separate processing step.
parameters
¶ All parameters necessary for this step.
 axis
int
Axis to average along
Default value: 0
 Type
 axis
 Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions
IndexError – Raised if axis is out of bounds for given dataset
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the projection with default values:
 kind: processing type: Projection
This will project the data along the first axis (index 0), yielding a 1D dataset.
If you would like to project along the second axis (index 1), simply set the appropriate parameter:
 kind: processing type: Projection properties: parameters: axis: 1
This will project the data along the second axis (index 1), yielding a 1D dataset.

static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Projection is only applicable to datasets with data of at least two dimensions.
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type


class
aspecd.processing.
SliceExtraction
¶ Bases:
aspecd.processing.SingleProcessingStep
Extract slice along one ore more dimensions from dataset.
With multidimensional datasets, there are use cases where you would like to operate only on a slice along a particular axis. One example may be to compare the first and last trace of a 2D dataset.
Note that “slice” can be anything from 1D to a ND array with at least one dimension less than the original array. If you want to extract a 1D slice from a ND dataset with N>2, you need to provide N1 values for
position
andaxis
. Make sure to always provide as many values forposition
than you provide foraxis
.You can either provide indices or axis values for
position
. For the latter, set the parameter “unit” accordingly. For details, see below.
parameters
¶ All parameters necessary for this step.
 axis :
Index of the axis or list of indices of the axes to take the position from to extract the slice
If you provide a list of axes, you need to provide as many positions as axes.
If an invalid axis is provided, an IndexError is raised.
Default: 0
 position :
Position(s) of the slice to extract
Positions can be given as axis indices (default) or axis values, if the parameter “unit” is set accordingly. For details, see below.
If you provide a list of positions, you need to provide as many axes as positions.
If no position is provided or the given position is out of bounds for the given axis, a ValueError is raised.
 unit
str
Unit used for specifying the range: either “axis” or “index”.
If an invalid value is provided, a ValueError is raised.
Default: “index”
 Type
 Raises
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions (i.e., 1D dataset).
ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given. Raised if too many values for axis are given. Raised if number of values for position and axis differ.
IndexError – Raised if axis is out of bounds for given dataset.
Changed in version 0.2: Parameter “index” renamed to “position” to reflect values to be either indices or axis values
New in version 0.2: Slice positions can be given both, as axis indices and axis values
New in version 0.2: Works for ND datasets with N>1
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the slice extraction with an index only:
 kind: processing type: SliceExtraction properties: parameters: position: 5
This will extract the sixth slice (index five) along the first axis (index zero).
If you would like to extract a slice along the second axis (with index one), simply provide both parameters, index and axis:
 kind: processing type: SliceExtraction properties: parameters: position: 5 axis: 1
This will extract the sixth slice along the second axis.
And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to extract a slice from runs from 340 to 350 and you would like to extract the slice corresponding to 343:
 kind: processing type: SliceExtraction properties: parameters: position: 343 unit: axis
In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.
For ND datasets with N>2, you can either extract a 1D or ND slice, with N always at least one dimension less than the original data. To extract a 2D slice from a 3D dataset, simply proceed as above, providing one value each for position and axis. If, however, you want to extract a 1D slice from a 3D dataset, you need to provide two values each for position and axis:
 kind: processing type: SliceExtraction properties: parameters: position: [21, 42] axis: [0, 2]
This particular case would be equivalent to
data[21, :, 42]
assumingdata
to contain the numeric data, besides, of course, that the processing step takes care of removing the axes as well.
static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Projection is only applicable to datasets with twodimensional data.
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type


class
aspecd.processing.
RangeExtraction
¶ Bases:
aspecd.processing.SingleProcessingStep
Extract range of data from dataset.
There are many reasons to look only at a certain range of data of a given dataset. For a ND array, one would use slicing, but for a dataset, one needs to have the axes adjusted as well, hence this processing step.

parameters
¶ All parameters necessary for this step.
 range
list
List of lists with indices for the slicing
For each dimension of the data of the dataset, one list of indices needs to be provided that are used for start, stop [, step] of
slice
. unit
str
Unit used for specifying the range: “axis”, “index”, “percentage”.
If an invalid value is provided, a ValueError is raised.
Default: “index”
 Type
 range
 Raises
ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given.
IndexError – Raised if no range is provided. Raised if number of ranges does not fit data dimensions.
New in version 0.2.
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the range extraction with one range only, assuming a 1D dataset:
 kind: processing type: RangeExtraction properties: parameters: range: [5, 10]
This will extract the range
data[5:10]
from your data (and adjust the axis accordingly). In case of 2D data, it would be fairly similar, except of now providing two ranges: kind: processing type: RangeExtraction properties: parameters: range:  [5, 10]  [3, 6]
Additionally, you can provide step sizes, just as you can do when slicing in Python:
 kind: processing type: RangeExtraction properties: parameters: range: [5, 10, 2]
This is equivalent to
data[5:10:2]
ordata[(slice(5, 10, 2))]
, accordingly.Sometimes, it is more convenient to give ranges in axis values rather than indices. This can be achieved by setting the parameter
unit
to “axis”: kind: processing type: RangeExtraction properties: parameters: range: [5, 10] unit: axis
Note that in this case, setting a step is meaningless and will be silently ignored. Furthermore, the nearest axis values will be used for the range.
In some cases you may want to extract a range by providing percentages instead of indices or axis values. Even this can be done:
 kind: processing type: RangeExtraction properties: parameters: range: [0, 10] unit: percentage
Here, the first ten percent of the data of the 1D dataset will be extracted, or more exactly the indices falling within the first ten percent. Note that in this case, setting a step is meaningless and will be silently ignored. Furthermore, the nearest axis values will be used for the range.


class
aspecd.processing.
BaselineCorrection
¶ Bases:
aspecd.processing.SingleProcessingStep
Subtract baseline from dataset.
Currently, only polynomial baseline corrections are supported.
The coefficients used to calculate the baseline will be written to the
parameters
dictionary upon processing.If no order is explicitly given, a polynomial baseline of zeroth order will be used.
Important
Baseline correction works only for 1D and 2D datasets, not for higherdimensional datasets.

parameters
¶ All parameters necessary for this step.
 kind
str
The kind of baseline correction to be performed.
Default: polynomial
 order
int
The order for the baseline correction if no coefficients are given.
Default: 0
 fit_area :
Parts of the spectrum to be considered as baseline, can be given as list or single number. If one number is given, it takes that percentage from both sides, respectively, i.e. 10 means 10% left and 10% right. If a list of two numbers is provided, the corresponding percentages are taken from each side of the spectrum, i.e.
[5, 20]
takes 5% from the left side and 20% from the right.Default: [10, 10]
 coefficients:
Coefficients used to calculate the baseline.
 axis
int
Axis along which to perform the baseline correction.
Only necessary in case of 2D data.
Default: 0
 Type
 kind
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the baseline correction with default values:
 kind: processing type: BaselineCorrection
In this case, a zerothorder polynomial baseline will be subtracted from your dataset using ten percent to the left and right, and in case of a 2D dataset, the baseline correction will be performed along the first axis (index zero) for all indices of the second axis (index 1).
Of course, often you want to control a little bit more how the baseline will be corrected. This can be done by explicitly setting some parameters.
Suppose you want to perform a baseline correction with a polynomial of first order:
 kind: processing type: BaselineCorrection properties: parameters: order: 1
If you want to change the (percental) area used for fitting the baseline, and even specify different ranges left and right:
 kind: processing type: BaselineCorrection properties: parameters: fit_area: [5, 20]
Here, five percent from the left and 20 percent from the right are used.
Finally, suppose you have a 2D dataset and want to average along the second axis (index one):
 kind: processing type: BaselineCorrection properties: parameters: axis: 1
Of course, you can combine the different options.
Changed in version 0.3: Coefficients are returned in unscaled data domain

static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Baseline correction is (currently) only applicable to datasets with one and twodimensional data.
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type


class
aspecd.processing.
Averaging
¶ Bases:
aspecd.processing.SingleProcessingStep
Average data over given range along given axis.
While projection as performed by
aspecd.processing.Projection
can be considered a special case of averaging using the whole range of one axis, averaging is usually performed over part of an axis only.Note
Currently, averaging works only for 2D datasets, not for higherdimensional datasets. This may, however, change in the future.
Important
Indices for the range work slightly different than in Python: While still zerobased, a range of [2, 3] will result in having the third and fourth column/row averaged. This seems more intuitive to the average scientist than sticking with Python (and having in this case the third column/row returned).
You can use negative indices as well, as long as the resulting indices are still within the range of the corresponding data dimension.

parameters
¶ All parameters necessary for this step.
 axis
int
The axis to average along.
Default: 0
 range
list
The range (start, end) to average over.
Default: []
 unit
str
Unit used for specifying the range: either “axis” or “index”.
Default: “index”
 Type
 axis
 Raises
ValueError – Raised if range is out of range for given axis or empty Raised if unit is not either “axis” or “index”
IndexError – Raised if axis is out of bounds for given dataset
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the averaging with a range only:
 kind: processing type: Averaging properties: parameters: range: [2, 3]
In this case, you will get your dataset averaged along the first axis (index zero), and averaged over the indices 2 and 3 of the second axis.
If you would like to average over the second axis (index 1), just specify this axis:
 kind: processing type: Averaging properties: parameters: range: [2, 3] axis: 1
And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to average over runs from 340 to 350 and you would like to average from 342 to 344:
 kind: processing type: Averaging properties: parameters: range: [342, 344] unit: axis
In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.

static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Averaging is only applicable to datasets with twodimensional data.
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type


class
aspecd.processing.
ScalarAxisAlgebra
¶ Bases:
aspecd.processing.SingleProcessingStep
Perform scalar algebraic operation on the axis of a dataset.
Sometimes, changing the values of an axis can be quite useful, for example to apply corrections obtained by some analysis step. Usually, this requires scalar algebraic operations on the axis values.

parameters
¶ All parameters necessary for this step.
 kind
str
Kind of scalar algebra to use
Valid values: “plus”, “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, “+”, ““, “*”, “/”, “power”, “pow”, “**”
 axis
int
Axis to operate on
Default value: 0
 value
float
Parameter of the scalar algebraic operation
Default value: None
 Type
 kind
 Raises
ValueError – Raised if no or wrong kind is provided
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to add a fixed value of 42 to the first axis (index 0) your dataset:
 kind: processing type: ScalarAxisAlgebra properties: parameters: kind: plus axis: 0 value: 42
Similarly, you could use “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, and “power” as kind  resulting in the given algebraic operation.
New in version 0.2.


class
aspecd.processing.
DatasetAlgebra
¶ Bases:
aspecd.processing.SingleProcessingStep
Perform scalar algebraic operation on two datasets.
To improve the signaltonoise ratio, adding the data of two datasets can sometimes be useful. Alternatively, adding or subtracting the data of two datasets can be used to help interpreting the signals.
Important
The data of the two datasets to perform the scalar algebraic operation on need to have the same dimension (that is checked for), and to obtain meaningful results, usually the axes values need to be identical as well. For this purpose, use the
CommonRangeExtraction
processing step.
parameters
¶ All parameters necessary for this step.
 kind
str
Kind of scalar algebra to use
Valid values: “plus”, “minus”, “add”, “subtract”, “+”, ““
Note that in contrast to scalar algebra, multiply and divide are not implemented for operation on two datasets.
 dataset
aspecd.dataset.Dataset
Dataset whose data to add or subtract
 Type
 kind
 Raises
ValueError – Raised if no or wrong kind is provided Raised if data of datasets have different shapes
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to add the data of the dataset referred to by its label
label_to_other_dataset
to your dataset: kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: label_to_other_dataset
Similarly, you could use “minus”, “add”, “subtract” as kind  resulting in the given algebraic operation.
As mentioned already, the data of both datasets need to have identical shape, and comparison is only meaningful if the axes are compatible as well. Hence, you will usually want to perform a CommonRangeExtraction processing step before doing algebra with two datasets:
 kind: multiprocessing type: CommonRangeExtraction results:  label_to_dataset  label_to_other_dataset  kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: label_to_other_dataset apply_to:  label_to_dataset
New in version 0.2.


class
aspecd.processing.
Interpolation
¶ Bases:
aspecd.processing.SingleProcessingStep
Interpolate data.
As soon as data of different datasets should be arithmetically combined, they need to have an identical grid. Often, this can only be achieved by interpolating one or both datasets.
Take care not to use interpolation to artificially smooth your data.
For an indepth discussion of interpolating ND data, see the following discussions on Stack Overflow, particularly the answers by Joe Kington providing both, theoretical insight and Python code:
Important
Currently, interpolation works only for 1D and 2D datasets, not for higherdimensional datasets. This may, however, change in the future.
Todo
Make type of interpolation controllable
Check for ways to make it work with ND, N>2

parameters
¶ All parameters necessary for this step.
 range
list
Range of the axis to interpolate for
Needs to be a list of lists in case of ND datasets with N>1, containing N twoelement vectors as ranges for each of the axes.
 npoints
list
Number of points to interpolate for
Needs to be a list in case of ND datasets with N>1, containing N elements, one for each of the axes.
 unit
str
Unit the ranges are given in
Can be either “index” (default) or “axis”.
 Type
 range
 Raises
ValueError – Raised if no range to interpolate for is provided. Raised if no number of points to interpolate for is provided. Raised if unit is unknown.
IndexError – Raised if list of ranges does not fit data dimensions. Raised if list of npoints does not fit data dimensions. Raised if given range is out of range of data/axes
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Generally, interpolating requires to provide both, a range and a number of points:
 kind: processing type: Interpolation properties: parameters: range: [10, 100] npoints: 901
This would interpolate your data between their indices 10 and 100 using 901 points. As it is sometimes (often) more convenient to work with axis units, you can tell the processing step to use axis values instead of indices:
 kind: processing type: Interpolation properties: parameters: range: [340, 350] npoints: 1001 unit: axis
This would interpolate your (1D) data between the axis values 340 and 350 using 1001 points.
New in version 0.2.

static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Interpolation is currently only applicable to datasets with one and twodimensional data.
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type

class
aspecd.processing.
Filtering
¶ Bases:
aspecd.processing.SingleProcessingStep
Filter data.
Generally, filtering is a large field of (digital) signal processing, and currently, this class only implements a very small subset of filters often applied in spectroscopy, namely lowpass filters that can be used for smoothing (“denoising”) data.
Filtering makes heavy use of the
scipy.ndimage
andscipy.signal
modules of the SciPy package. For details, see there.Filtering works with data with arbitrary dimensions, in this case applying the filter in each dimension.

parameters
¶ All parameters necessary for this step.
 type
str
Type of the filter to use
Currently, three types are supported: “uniform”, “gaussian”, “savitzkygolay”. For convenience, a list of aliases exists for each of these types, and if you use one of these aliases, it will be replaced by its generic name:
Generic
Alias
‘uniform’
‘box’, ‘boxcar’, ‘movingaverage’, ‘car’
‘gaussian’
‘binom’, ‘binomial’
‘savitzkygolay’
‘savitzky_golay’, ‘savitzky golay’, ‘savgol’, ‘savitzky’
 window_length
int
Length of the filter window
The window needs to be smaller than the actual data. If you provide a window length that exceeds the data range, an exception will be raised.
 order
int
Polynomial order for the SavitzkyGolay filter
Only necessary for this type of filter. If no order is given for this filter, an exception will be raised.
 Type
 type
 Raises
ValueError – Raised if no or wrong filter type is provided. Raised if no filter window is provided. Raised if filter window exceeds data range. Raised in case of SavitzkyGolay filter when no order is provided.
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Generally, filtering requires to provide both, a type of filter and a window length. Therefore, for uniform and Gaussian filters, this would be:
 kind: processing type: Filtering properties: parameters: type: uniform window_length: 10
Of course, at least uniform filtering (also known as boxcar or moving average) is strongly discouraged due to the artifacts introduced. Probably the best bet for applying a filter to smooth your data is the SavitzkyGolay filter:
 kind: processing type: Filtering properties: parameters: type: savitzkygolay window_length: 10 order: 3
Note that for this filter, you need to provide the polynomial order as well. To get best results, you will need to experiment with the parameters a bit.
New in version 0.2.


class
aspecd.processing.
CommonRangeExtraction
¶ Bases:
aspecd.processing.MultiProcessingStep
Extract the common range of data for multiple datasets using interpolation.
One prerequisite for adding up multiple datasets in a meaningful way is to have their data dimensions as well as their respective axes values agree. This usually requires interpolating the data to a common set of axes.
Important
Currently, extracting the common range works only for 1D and 2D datasets, not for higherdimensional datasets, due to the underlying method of interpolation. See
Interpolation
for details. This may, however, change in the future.Todo
Make type of interpolation controllable
Make number of points controllable (in absolute numbers as well as minimum and maximum points with respect to datasets)

parameters
¶ All parameters necessary for this step.
 ignore_units
bool
Whether to ignore the axes units when checking the datasets for applicability.
Usually, the axes units should be identical, but sometimes, they may be named differently or be compatible anyways. Use with care and only in case you exactly know what you do
Default: False
 common_range
list
Common range of values for each axis as determined by the processing step.
For >1D datasets, this will be a list of lists.
 npoints
list
Number of points used for the final grid the data are interpolated on.
The length is identical to the dimensions of the data of the datasets.
 Type
 ignore_units
 Raises
ValueError – Raised if datasets have axes with different units or disjoint values Raised if datasets have different dimensions
IndexError – Raised if axis is out of bounds for given dataset
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to bring all datasets currently loaded into your recipe to a common range (use with caution, however), things can be as simple as:
 kind: multiprocessing type: CommonRangeExtraction
Note that this will operate on all datasets currently available in your recipe, including results from other processing steps. Therefore, it is usually better to be explicit, using
apply_to
. Otherwise, you can use this processing step early on in your recipe.Usually, however, you will want to restrict this to a subset using
apply_to
and provide labels for the results: kind: multiprocessing type: CommonRangeExtraction results:  dataset1_cut  dataset2_cut apply_tp:  dataset1  dataset2
If you want to perform algebraic operations on datasets, the data of both datasets need to have identical shape, and comparison is only meaningful if the axes are compatible as well. Hence, you will usually want to perform a CommonRangeExtraction processing step before doing algebra with two datasets:
 kind: multiprocessing type: CommonRangeExtraction results:  label_to_dataset  label_to_other_dataset  kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: label_to_other_dataset apply_to:  label_to_dataset
For details of the algebraic operations on datasets, see
DatasetAlgebra
.New in version 0.2.

static
applicable
(dataset)¶ Check whether processing step is applicable to the given dataset.
Extracting a common range is currently only applicable to datasets with one and twodimensional data, due to the underlying interpolation.
 Parameters
dataset (
aspecd.dataset.Dataset
) – dataset to check Returns
applicable – True if successful, False otherwise.
 Return type

class
aspecd.processing.
Noise
¶ Bases:
aspecd.processing.SingleProcessingStep
Add (coloured) noise to data.
Particularly for testing algorithms and hence creating test data, adding noise to these test data is crucial. Furthermore, the naive approach of adding white (Gaussian, normally distributed) noise often does not reflect the physical reality, as “real” noise often has a different power spectral density (PSD).
Probably the kind of noise most often encountered in spectroscopy is 1/f noise or pink noise, with the PSD decreasing by 3 dB per octave or 10 dB per decade. For more details on the different kinds of noise, the following sources may be a good starting point:
Different strategies exist to create coloured noise, and the implementation used here follows basically the ideas published by Timmer and König:
J. Timmer and M. König: On generating power law noise. Astronomy and Astrophysics 300, 707–710 (1995)
In short: In the Fourier space, normally distributed random numbers are drawn for power and phase of each frequency component, and the power scaled by the appropriate power law. Afterwards, the resulting frequency spectrum is back transformed using iFFT and ensuring real data.
Further inspiration came from the following two sources:
Note: The first is based on a MATLAB(R) code by Max Little and contains a number of errors in its Python translation that are not present in the original code.
The added noise has always a mean of (close to) zero.

parameters
¶ All parameters necessary for this step.
 exponent
float
The exponent used for scaling the power of the frequency components
0 – white (Gaussian) noise 1 – pink (1/f) noise 2 – Brownian (1/f**2) noise
Default: 1 (pink noise)
 normalise
bool
Whether to normalise the noise amplitude prior to adding to the data.
In this case, the amplitude is normalised to 1.
 Type
 exponent
Note
The exponent for the noise is not restricted to integer values, nor to negative values. While for spectroscopic data, pink (1/f) noise usually prevails (exponent = 1), the opposite effect with high frequencies dominating can occur as well. A prominent example of naturally occurring “blue noise” with the density proportional to f is the Cherenkov radiation.
Note
In case of ND data, the coloured noise is calculated along the first dimension only, all other dimensions will exhibit (close to) white (Gaussian) noise. Generally, this should not be a problem in spectroscopy, as usually, data are recorded over time in one dimension only, and only in this (implicit) time dimension coloured noise will be relevant.
New in version 0.3.

class
aspecd.processing.
ChangeAxesValues
¶ Bases:
aspecd.processing.SingleProcessingStep
Change values of individual axes.
What sounds pretty much like data manipulation is sometimes a necessity due to the shortcoming of vendor file formats. Let’s face it, but sometimes values read from raw data simply are wrong, due to wrong readout or wrong processing of these parameters within the device. Therefore, it seems much better to transparently change the respective axis values rather than having to modify raw data by hand. Using a processing step has two crucial advantages: (i) it allows for full reproducibility and traceability, and (ii) it can be done in context of recipedriven data analysis, i.e. not requiring any programming skills.
Note
A realworld example: angulardependent measurements recorded wrong angles in the raw data file, while the actual positions were correct. Assuming measurements from 0° to 180° in 10° steps, it is pretty straightforward how to fix this problem: Assign equidistant values from 0° to 180° and use the information about the actual axis length.

parameters
¶ All parameters necessary for this step.
 range
list
The range of the axis, i.e. start and end value
 axes
list
The axes to set the new values for
Can be an integer in case of a single axis, otherwise a list of integers. If omitted, all axes with values will be assumed (i.e., one per data dimension).
 Type
 range
 Raises
IndexError – Raised if index is out of range for axes or given number of axes and ranges is incompatible
Examples
For convenience, a series of examples in recipe style (for details of the recipedriven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to change the axis range of a 1D dataset, things are as simple as:
 kind: singleprocessing type: ChangeAxesValues properties: parameters: range: [35, 42]
This would take the first axis (index 0) and set the range to linearly spaced data ranging from 35 to 42, of course with the same number of values as before.
If you would want to change both axes in a 2D dataset, same here:
 kind: singleprocessing type: ChangeAxesValues properties: parameters: range:  [35, 42]  [17.5, 21]
This would set the range of the first axis (index 0) to the interval [35, 42], and the range of the second axis (index 1) to the interval [17.5, 21].
More often, you may have a 2D dataset where you intend to change the values of only one axis. Suppose the example from above with angulardependent measurements and the angles in the second dimension:
 kind: singleprocessing type: ChangeAxesValues properties: parameters: range: [0, 180] axes: 1
Here, the second axis (index 1) will be set accordingly.
New in version 0.3.
