You're reading the documentation for a development version. For the latest released version, please have a look at v0.12.
aspecd.processing module
Data processing functionality.
Key to reproducible science is automatic documentation of each processing step applied to the data of a dataset. Such a processing step each is self-contained, meaning it contains every necessary information to perform the processing task on a given dataset.
Processing steps, in contrast to analysis steps (see aspecd.analysis
for details), not only operate on data of a aspecd.dataset.Dataset
,
but change its data. The information necessary to reproduce each processing
step gets added to the aspecd.dataset.Dataset.history
attribute of a
dataset.
Generally, two types of processing steps can be distinguished:
Processing steps for handling single datasets
Shall be derived from
aspecd.processing.SingleProcessingStep
.Processing steps for handling multiple datasets
Shall be derived from
aspecd.processing.MultiProcessingStep
.
In the first case, the processing is usually handled using the
processing()
method of the respective aspecd.dataset.Dataset
object. Additionally, those processing steps always only operate on the data
of a single dataset. Processing steps handling single datasets should always
inherit from the aspecd.processing.SingleProcessingStep
class.
In the second case, the processing step is handled using the processing()
method of the aspecd.processing.ProcessingStep
object, and the datasets
are stored as a list within the processing step. As these processing steps span
several datasets. Processing steps handling multiple datasets should
always inherit from the aspecd.processing.MultiProcessingStep
class.
The module contains both, base classes for processing steps (as detailed above) as well as a series of generally applicable processing steps for all kinds of spectroscopic data. The latter are an attempt to relieve the developers of packages derived from the ASpecD framework from the task to reinvent the wheel over and over again.
The next section gives an overview of the concrete processing steps implemented within the ASpecD framework. For details of how to implement your own processing steps, see the section below.
Concrete processing steps
Besides providing the basis for processing steps for the ASpecD framework, ensuring full reproducibility and traceability, hence reproducible science and good scientific practice, this module comes with a (growing) number of general-purpose processing steps useful for basically all kinds of spectroscopic data.
Here is a list as a first overview. For details, see the detailed documentation of each of the classes, readily accessible by the link.
Processing steps operating on individual datasets
The following processing steps operate each on individual datasets independently.
aspecd.processing.Normalisation
Normalise data.
There are different kinds of normalising data: maximum, minimum, amplitude, area
-
Integrate data
aspecd.processing.Differentiation
Differentiate data, i.e., return discrete first derivative
aspecd.processing.ScalarAlgebra
Perform scalar algebraic operation on one dataset.
Operations available: add, subtract, multiply, divide (by given scalar)
aspecd.processing.ScalarAxisAlgebra
Perform scalar algebraic operation on axis values of a dataset.
Operations available: add, subtract, multiply, divide, power (by given scalar)
aspecd.processing.DatasetAlgebra
Perform scalar algebraic operation on two datasets.
Operations available: add, subtract
-
Project data, i.e. reduce dimensions along one axis.
aspecd.processing.SliceExtraction
Extract slice along one or more dimensions from dataset.
aspecd.processing.SliceRemoval
Remove slice along one or more dimensions from dataset.
aspecd.processing.SliceRearrangement
Rearrange slices of a dataset along one dimension.
aspecd.processing.RangeExtraction
Extract range of data from a dataset.
aspecd.processing.BaselineCorrection
Correct baseline of dataset.
-
Average data over given range along given axis.
aspecd.processing.Interpolation
Interpolate data.
-
Filter data.
-
Add (coloured) noise to data.
aspecd.processing.Denoising1DSVD
Denoise 1D data using singular value decomposition (SVD).
aspecd.processing.ChangeAxesValues
Change values of individual axes.
aspecd.processing.RelativeAxis
Create relative axis, centred about a given value.
Processing steps operating on multiple datasets at once
The following processing steps operate each on more than one dataset at the same time, requiring at least two datasets as an input to work.
aspecd.processing.CommonRangeExtraction
Extract the common range of data for multiple datasets using interpolation.
Useful (and often necessary) for performing algebraic operations on datasets.
Writing own processing steps
Each real processing step should inherit from either
aspecd.processing.SingleProcessingStep
in case of operating on a
single dataset only or from aspecd.processing.MultiProcessingStep
in
case of operating on several datasets at once. Furthermore, all processing
steps should be contained in one module named “processing”. This allows for
easy automation and replay of processing steps, particularly in context of
recipe-driven data analysis (for details, see the aspecd.tasks
module).
General advice
A few hints on writing own processing step classes:
Always inherit from
aspecd.processing.SingleProcessingStep
oraspecd.processing.MultiProcessingStep
, depending on your needs.Store all parameters, implicit and explicit, in the dict
parameters
of theaspecd.processing.ProcessingStep
class, not in separate properties of the class. Only this way, you can ensure full reproducibility and compatibility of recipe-driven data analysis (for details of the latter, see theaspecd.tasks
module).Always set the
description
property to a sensible value.Always set the
undoable
property appropriately. In most cases, processing steps can be undone.Implement the actual processing in the
_perform_task
method of the processing step. For sanitising parameters and checking general applicability of the processing step to the dataset(s) at hand, continue reading.Make sure to implement the
aspecd.processing.ProcessingStep.applicable()
method according to your needs. Typical cases would be to check for the dimensionality of the underlying data, as some processing steps may work only for 1D data (or vice versa). Don’t forget to declare this as a static method, using the@staticmethod
decorator.With the
_sanitise_parameters
method, the input parameters are automatically checked and an appropriate exception can be thrown in order to describe the error source to the user.
Some more special cases are detailed below. For further advice, consult the source code of this module, and have a look at the concrete processing steps whose purpose is described below in more detail.
Changing the dimensions of your data
If your processing step changes the dimensions of your data, it is your responsibility to ensure the axes values to be consistent with the data. Note that upon changing the dimension of your data, the axes values will be reset to indices along the data dimensions. Hence, you need to first make a (deep) copy of your axes, then change the dimension of your data, and afterwards restore the remaining values from the temporarily stored axes.
Changing the length of your data
When changing the length of the data, always change the corresponding axes values first, and only afterwards the data, as changing the data will change the axes values and adjust their length to the length of the corresponding dimension of the data.
Adding parameters upon processing
Sometimes there is the need to persist values that are only obtained during
processing the data. A typical example may be averaging 2D data along one
dimension and wanting to store both, range of indices and actual axis units.
While in this case, typically the axis value of the centre of the averaging
window will be stored as new axis value, the other parameters should end up
in the aspecd.processing.ProcessingStep.parameters
dictionary. Thus,
they are added to the dataset history and available for reports and alike.
Module documentation
- class aspecd.processing.ProcessingStep
Bases:
ToDictMixin
Base class for processing steps.
Each class actually performing a processing step should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).
Further things that need to be changed upon inheriting from this class are the string stored in
description
, being basically a one-liner, and the flagundoable
if necessary.When is a processing step undoable?
Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.
One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).
The actual implementation of the processing step is done in the private method
_perform_task()
that in turn gets called byprocess()
which is called by theaspecd.dataset.Dataset.process()
method of the dataset object.Note
Usually, you will never implement an instance of this class for actual processing tasks, but rather one of the child classes, namely
aspecd.processing.SingleProcessingStep
andaspecd.processing.MultiProcessingStep
, depending on whether your processing step operates on a single dataset or requires multiple datasets.- parameters
Parameters required for performing the processing step
All parameters, implicit and explicit.
- Type:
- references
List of references with relevance for the implementation of the processing step.
Use appropriate record types from the bibrecord package.
- Type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
Changed in version 0.4: New attribute
references
- process()
Perform the actual processing step.
The actual processing step should be implemented within the non-public method
_perform_task()
. Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the non-public method_check_applicability()
, default parameter values will be set calling the non-public method_set_defaults()
, and the parameters will be sanitised by calling the non-public method_sanitise_parameters()
prior to calling_perform_task()
.
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Returns True by default and needs to be implemented in classes inheriting from SingleProcessingStep according to their needs.
This is a static method that gets called automatically by each class inheriting from
aspecd.processing.SingleProcessingStep
. Hence, if you need to override it in your own class, make the method static as well. An example of an implementation testing for two-dimensional data is given below:@staticmethod def applicable(dataset): return len(dataset.data.axes) == 3
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.SingleProcessingStep
Bases:
ProcessingStep
Base class for processing steps operating on single datasets.
Each class actually performing a processing step involving only a single dataset should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).
To perform the processing step, call the
process()
method of the dataset the processing should be applied to, and provide a reference to the actual processing_step object to it.Further things that need to be changed upon inheriting from this class are the string stored in
description
, being basically a one-liner, and the flagundoable
if necessary.When is a processing step undoable?
Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.
One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).
The actual implementation of the processing step is done in the private method
_perform_task()
that in turn gets called byprocess()
which is called by theaspecd.dataset.Dataset.process()
method of the dataset object.- dataset
Dataset the processing step should be performed on
- Type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
- to_dict(remove_empty=False)
Create dictionary containing public attributes of an object.
In this particular case, the key “dataset” from the top level of the resulting dictionary will be removed, but not keys with the same name on lower levels of the resulting dict.
- Parameters:
remove_empty (
bool
) –Whether to remove keys with empty values
Default: False
- Returns:
public_attributes – Ordered dictionary containing the public attributes of the object
The order of attribute definition is preserved
- Return type:
- process(dataset=None, from_dataset=False)
Perform the actual processing step on the given dataset.
If no dataset is provided at method call, but is set as property in the SingleProcessingStep object, the
aspecd.dataset.Dataset.process()
method of the dataset will be called and thus the history written.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The
aspecd.dataset.Dataset
object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within theaspecd.processing.SingleProcessingStep
object is not necessary.The actual processing step should be implemented within the non-public method
_perform_task()
. Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the non-public method_check_applicability()
, default parameter values will be set calling the non-public method_set_defaults()
, and the parameters will be sanitised by calling the non-public method_sanitise_parameters()
prior to calling_perform_task()
.- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to apply processing step tofrom_dataset (boolean) –
whether we are called from within a dataset
Defaults to “False” and shall never be set manually.
- Returns:
dataset – dataset the processing step has been applied to
- Return type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
- create_history_record()
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.process()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly.- Returns:
history_record – history record for processing step
- Return type:
- class aspecd.processing.MultiProcessingStep
Bases:
ProcessingStep
Base class for processing steps operating on multiple datasets.
Each class actually performing a processing step involving multiple datasets should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).
To perform the processing step, call the
process()
method directly. This will take care of writing the history to each individual dataset as well.Further things that need to be changed upon inheriting from this class are the string stored in
description
, being basically a one-liner, and the flagundoable
if necessary.When is a processing step undoable?
Sometimes, the question arises what distinguishes an undoable processing step from one that isn’t, particularly in light of having the original data stored in the dataset.
One simple case of a processing step that cannot easily be undone and redone afterwards (undo needs always to be thought in light of an inverting redo) is adding data of two datasets together. From the point of view of the single dataset, the other dataset is not accessible. Therefore, such a step is undoable (subtracting two datasets as well, of course).
The actual implementation of the processing step is done in the private method
_perform_task()
that in turn gets called byprocess()
which is called by theaspecd.dataset.Dataset.process()
method of the dataset object.- datasets
List of
aspecd.dataset.Dataset
objects the processing step should act on- Type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
New in version 0.2.
- process()
Perform the actual processing step.
The actual processing step should be implemented within the non-public method
_perform_task()
. Besides that, the applicability of the processing step to the given dataset(s) will be checked automatically using the non-public method_check_applicability()
, default parameter values will be set calling the non-public method_set_defaults()
, and the parameters will be sanitised by calling the non-public method_sanitise_parameters()
prior to calling_perform_task()
.- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised when processing step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
- create_history_record()
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.process()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly.- Returns:
history_record – history record for processing step
- Return type:
- class aspecd.processing.Normalisation
Bases:
SingleProcessingStep
Normalise data.
There are different kinds of normalising data:
maximum
Data are divided by their maximum value
minimum
Data are divided by their minimum value
amplitude
Data are divided by the difference between their maximum and minimum
area
Data are divided by the sum of their absolute values, the number of points is also taken into account.
You can set these kinds using the attribute
parameters["kind"]
.Important
Before normalising your data, make sure they have a proper baseline, as otherwise, your normalisation will lead to strange results.
Note
Normalisation can be used for N-D data as well. In this case, the data as a whole are normalised accordingly.
Todo
How to handle noisy data in case of area normalisation, as this would probably account for double the noise if simply taking the absolute?
- parameters
All parameters necessary for this step.
- kind
str
Kind of normalisation to use
Valid values: “maximum”, “minimum”, “amplitude”, “area”
Note that the first three can be abbreviated, everything containing “max”, “min”, “amp” will be understood respectively.
Defaults to “maximum”
- range
list
Range of the data of the dataset to normalise for.
This can be quite useful if you want to normalise for a specific feature, e.g. an artifact that you’ve recorded separately and want to subtract from the data, or more generally to normalise to certain features of your data irrespective of other parts.
Ranges can be given as indices or in axis units, and for ND datasets, you need to provide as many ranges as dimensions of your data. Units default to indices, but can be specified using the parameter
range_unit
, see below.As internally,
RangeExtraction
is used, see there for more details of how to provide ranges.- range_unit
str
Unit used for the range.
Can be either “index” (default) or “axis”.
- noise_range
list
Data range to use for determining noise level
If provided, the normalisation will account for the noise in case of normalising to minimum, maximum, and amplitude.
In case of ND datasets with N>1, you need to provide as many ranges as dimensions of your data.
Numbers are interpreted by default as percentage.
Default: None
- noise_range_unit
str
Unit for specifying noise range
Valid units are “index”, “axis”, “percentage”, with the latter being default. As internally,
RangeExtraction
gets used, see there for further details.Default: percentage
- Type:
- kind
- Raises:
ValueError : – Raised if unknown kind is provided
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the normalisation with default values:
- kind: processing type: Normalisation
This will normalise your data to their maximum.
Sometimes, normalising to maximum is not what you need, hence you can control in more detail the criterion using the appropriate parameter:
- kind: processing type: Normalisation properties: parameters: kind: amplitude
In this case, you would normalise to the amplitude, meaning setting the difference between minimum and maximum to one. For other kinds, see above.
If you want to normalise not over the entire range of the dataset, but only over a dedicated range, simply provide the necessary parameters:
- kind: processing type: Normalisation properties: parameters: range: [50, 150]
In this case, we assume a 1D dataset and use indices, requiring the data to span at least over 150 points. Of course, it is often more convenient to provide axis units. Here you go:
- kind: processing type: Normalisation properties: parameters: range: [340, 350] range_unit: axis
And in case of ND datasets with N>1, make sure to provide as many ranges as dimensions of your dataset, in case of a 2D dataset:
- kind: processing type: Normalisation properties: parameters: range: - [50, 150] - [30, 40]
Here as well, the range can be given in indices or axis units, but defaults to indices if no unit is explicitly given.
Note
A note for developers: If you inherit from this class and plan to implement further kinds of normalisation, first test for your specific kind of normalisation, and in the
else
block add a call tosuper()._perform_task()
. This way, you ensure theValueError
will still be raised in case of an unknown kind.New in version 0.2: Normalising over range of data
New in version 0.2: Accounting for noise for ND data with N>1
Changed in version 0.2: noise_range changed to list from integer
- class aspecd.processing.Integration
Bases:
SingleProcessingStep
Integrate data
Currently, the data are integrated using the
numpy.cumsum()
function. This may change in the future, and you may be able to choose between different algorithms. A potential candidate would be using FFT/IFFT and performing the operation in Fourier space.Note
N-D arrays can be integrated as well. In this case,
np.cumsum()
will operate on the last axis.Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.As currently, there are no parameters you can set, integrating is as simple as this:
- kind: processing type: Integration
- class aspecd.processing.Differentiation
Bases:
SingleProcessingStep
Differentiate data, i.e., return discrete first derivative
Currently, the data are differentiated using the
numpy.gradient()
function. This may change in the future, and you may be able to choose between different algorithms. A potential candidate would be using FFT/IFFT and performing the operation in Fourier space.Note
N-D arrays can be differentiated as well. In this case, differentiation will operate on the last axis.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.As currently, there are no parameters you can set, differentiating is as simple as this:
- kind: processing type: Differentiation
Changed in version 0.3: Method changed from
numpy.diff()
tonumpy.gradient()
- class aspecd.processing.ScalarAlgebra
Bases:
SingleProcessingStep
Perform scalar algebraic operation on one dataset.
To compare datasets (by eye), it might be useful to adapt their intensity by algebraic operations. Adding, subtracting, multiplying and dividing are implemented here.
- parameters
All parameters necessary for this step.
- kind
str
Kind of scalar algebra to use
Valid values: “plus”, “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, “+”, “-”, “*”, “/”
- value
float
Parameter of the scalar algebraic operation
Default value: 1.0
- Type:
- kind
- Raises:
ValueError – Raised if no or wrong kind is provided
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to add a fixed value of 42 to your dataset:
- kind: processing type: ScalarAlgebra properties: parameters: kind: add value: 42
Similarly, you could use “minus”, “times”, “by”, “add”, “subtract”, “multiply”, or “divide” as kind - resulting in the given algebraic operation.
- class aspecd.processing.Projection
Bases:
SingleProcessingStep
Project data, i.e. reduce dimensions along one axis.
There is many reasons to project along one axis, if nothing else increasing signal-to-noise ratio if multiple scans have been recorded as 2D dataset.
While projection can be considered a special case of averaging as performed by
aspecd.processing.Averaging
and using the whole range of one axis, averaging is usually performed over part of an axis only. Hence, projection is semantically different and therefore implemented as a separate processing step.- parameters
All parameters necessary for this step.
- axis
int
Axis to average along
Default value: 0
- Type:
- axis
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions
IndexError – Raised if axis is out of bounds for given dataset
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the projection with default values:
- kind: processing type: Projection
This will project the data along the first axis (index 0), yielding a 1D dataset.
If you would like to project along the second axis (index 1), simply set the appropriate parameter:
- kind: processing type: Projection properties: parameters: axis: 1
This will project the data along the second axis (index 1), yielding a 1D dataset.
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Projection is only applicable to datasets with data of at least two dimensions.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.SliceExtraction
Bases:
SingleProcessingStep
Extract slice along one or more dimensions from dataset.
With multidimensional datasets, there are use cases where you would like to operate only on a slice along a particular axis. One example may be to compare the first and last trace of a 2D dataset.
Note that “slice” can be anything from 1D to a ND array with at least one dimension less than the original array. If you want to extract a 1D slice from a ND dataset with N>2, you need to provide N-1 values for
position
andaxis
. Make sure to always provide as many values forposition
than you provide foraxis
.You can either provide indices or axis values for
position
. For the latter, set the parameter “unit” accordingly. For details, see below.- parameters
All parameters necessary for this step.
- axis :
Index of the axis or list of indices of the axes to take the position from to extract the slice
If you provide a list of axes, you need to provide as many positions as axes.
If an invalid axis is provided, an IndexError is raised.
Default: 0
- position :
Position(s) of the slice to extract
Positions can be given as axis indices (default) or axis values, if the parameter “unit” is set accordingly. For details, see below.
If you provide a list of positions, you need to provide as many axes as positions.
If no position is provided or the given position is out of bounds for the given axis, a ValueError is raised.
- unit
str
Unit used for specifying the range: either “axis” or “index”.
If an invalid value is provided, a ValueError is raised.
Default: “index”
- Type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions (i.e., 1D dataset).
ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given. Raised if too many values for axis are given. Raised if number of values for position and axis differ.
IndexError – Raised if axis is out of bounds for given dataset.
Changed in version 0.2: Parameter “index” renamed to “position” to reflect values to be either indices or axis values
New in version 0.2: Slice positions can be given both, as axis indices and axis values
New in version 0.2: Works for ND datasets with N>1
Changed in version 0.7: Sets dataset label to slice position (in axes units)
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the slice extraction with an index only:
- kind: processing type: SliceExtraction properties: parameters: position: 5
This will extract the sixth slice (index five) along the first axis (index zero).
If you would like to extract a slice along the second axis (with index one), simply provide both parameters, index and axis:
- kind: processing type: SliceExtraction properties: parameters: position: 5 axis: 1
This will extract the sixth slice along the second axis.
And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to extract a slice from runs from 340 to 350, and you would like to extract the slice corresponding to 343:
- kind: processing type: SliceExtraction properties: parameters: position: 343 unit: axis
In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.
For ND datasets with N>2, you can either extract a 1D or ND slice, with N always at least one dimension less than the original data. To extract a 2D slice from a 3D dataset, simply proceed as above, providing one value each for position and axis. If, however, you want to extract a 1D slice from a 3D dataset, you need to provide two values each for position and axis:
- kind: processing type: SliceExtraction properties: parameters: position: [21, 42] axis: [0, 2]
This particular case would be equivalent to
data[21, :, 42]
assumingdata
to contain the numeric data, besides, of course, that the processing step takes care of removing the axes as well.- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Slice extraction is only applicable to datasets with at least two-dimensional data.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.SliceRemoval
Bases:
SingleProcessingStep
Remove slice along one dimension from dataset.
With multidimensional datasets, there are use cases where you would like to remove a slice along a particular axis, mostly due to artifacts contained in this slice that impair downstream processing and analysis.
You can either provide indices or axis values for
position
. For the latter, set the parameter “unit” accordingly. For details, see below.- parameters
All parameters necessary for this step.
- axis :
Index of the axis or list of indices of the axes to take the position from to remove the slice
If you provide a list of axes, you need to provide as many positions as axes.
If an invalid axis is provided, an IndexError is raised.
Default: 0
- position :
Position(s) of the slice to remove
Positions can be given as axis indices (default) or axis values, if the parameter “unit” is set accordingly. For details, see below.
If you provide a list of positions, you need to provide as many axes as positions.
If no position is provided or the given position is out of bounds for the given axis, a ValueError is raised.
- unit
str
Unit used for specifying the range: either “axis” or “index”.
If an invalid value is provided, a ValueError is raised.
Default: “index”
- Type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions (i.e., 1D dataset).
ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given. Raised if too many values for axis are given. Raised if number of values for position and axis differ.
IndexError – Raised if axis is out of bounds for given dataset.
New in version 0.8.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the slice removal with an index only:
- kind: processing type: SliceRemoval properties: parameters: position: 5
This will remove the sixth slice (index five) along the first axis (index zero).
If you would like to remove a slice along the second axis (with index one), simply provide both parameters, index and axis:
- kind: processing type: SliceRemoval properties: parameters: position: 5 axis: 1
This will remove the sixth slice along the second axis.
And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to remove a slice from runs from 340 to 350, and you would like to remove the slice corresponding to 343:
- kind: processing type: SliceRemoval properties: parameters: position: 343 unit: axis
In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.
For ND datasets with N>2, you can currently only remove a slice along one axis.
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Slice removal is only applicable to datasets with at least two-dimensional data.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.RangeExtraction
Bases:
SingleProcessingStep
Extract range of data from dataset.
There are many reasons to look only at a certain range of data of a given dataset. For a ND array, one would use slicing, but for a dataset, one needs to have the axes adjusted as well, hence this processing step.
- parameters
All parameters necessary for this step.
- range
list
List of lists with indices for the slicing
For each dimension of the data of the dataset, one list of indices needs to be provided that are used for start, stop [, step] of
slice
.- unit
str
Unit used for specifying the range: “axis”, “index”, “percentage”.
If an invalid value is provided, a ValueError is raised.
Default: “index”
- Type:
- range
- Raises:
ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given.
IndexError – Raised if no range is provided. Raised if number of ranges does not fit data dimensions.
New in version 0.2.
Changed in version 0.9.2: Range extraction with axis values sets correct upper boundary
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the range extraction with one range only, assuming a 1D dataset:
- kind: processing type: RangeExtraction properties: parameters: range: [5, 10]
This will extract the range
data[5:10]
from your data (and adjust the axis accordingly). In case of 2D data, it would be fairly similar, except of now providing two ranges:- kind: processing type: RangeExtraction properties: parameters: range: - [5, 10] - [3, 6]
Additionally, you can provide step sizes, just as you can do when slicing in Python:
- kind: processing type: RangeExtraction properties: parameters: range: [5, 10, 2]
This is equivalent to
data[5:10:2]
ordata[(slice(5, 10, 2))]
, accordingly. Note that in Python, ranges exclude the upper limit.Sometimes, it is more convenient to give ranges in axis values rather than indices. This can be achieved by setting the parameter
unit
to “axis”:- kind: processing type: RangeExtraction properties: parameters: range: [5, 10] unit: axis
Note that in this case, setting a step is meaningless and will be silently ignored. Furthermore, the nearest axis values will be used for the range. Furthermore, for more intuitive use, the given range includes the upper limit, in contrast to using indices. This is to be consistent with Python’s handling of ranges as weell as with the intuition of most scientists regarding the ranges for axis values.
In some cases you may want to extract a range by providing percentages instead of indices or axis values. Even this can be done:
- kind: processing type: RangeExtraction properties: parameters: range: [0, 10] unit: percentage
Here, the first ten percent of the data of the 1D dataset will be extracted, or more exactly the indices falling within the first ten percent. Note that in this case, setting a step is meaningless and will be silently ignored. Furthermore, the nearest axis values will be used for the range.
- class aspecd.processing.BaselineCorrection
Bases:
SingleProcessingStep
Subtract baseline from dataset.
Currently, only polynomial baseline corrections are supported.
The coefficients used to calculate the baseline will be written to the
parameters
dictionary upon processing.If no order is explicitly given, a polynomial baseline of zeroth order will be used.
Important
Baseline correction works only for 1D and 2D datasets, not for higher-dimensional datasets.
- parameters
All parameters necessary for this step.
- kind
str
The kind of baseline correction to be performed.
Default: polynomial
- order
int
The order for the baseline correction if no coefficients are given.
Default: 0
- fit_area :
Parts of the spectrum to be considered as baseline, can be given as list or single number. If one number is given, it takes that percentage from both sides, respectively, i.e. 10 means 10% left and 10% right. If a list of two numbers is provided, the corresponding percentages are taken from each side of the spectrum, i.e.
[5, 20]
takes 5% from the left side and 20% from the right.Default: [10, 10]
- coefficients:
Coefficients used to calculate the baseline.
- axis
int
Axis along which to perform the baseline correction.
Only necessary in case of 2D data.
Default: 0
- Type:
- kind
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the baseline correction with default values:
- kind: processing type: BaselineCorrection
In this case, a zeroth-order polynomial baseline will be subtracted from your dataset using ten percent to the left and right, and in case of a 2D dataset, the baseline correction will be performed along the first axis (index zero) for all indices of the second axis (index 1).
Of course, often you want to control a little bit more how the baseline will be corrected. This can be done by explicitly setting some parameters.
Suppose you want to perform a baseline correction with a polynomial of first order:
- kind: processing type: BaselineCorrection properties: parameters: order: 1
If you want to change the (percental) area used for fitting the baseline, and even specify different ranges left and right:
- kind: processing type: BaselineCorrection properties: parameters: fit_area: [5, 20]
Here, five percent from the left and 20 percent from the right are used.
Finally, suppose you have a 2D dataset and want to average along the second axis (index one):
- kind: processing type: BaselineCorrection properties: parameters: axis: 1
Of course, you can combine the different options.
Changed in version 0.3: Coefficients are returned in unscaled data domain
Changed in version 0.6.3: Zero values in range properly handled
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Baseline correction is (currently) only applicable to datasets with one- and two-dimensional data.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.Averaging
Bases:
SingleProcessingStep
Average data over given range along given axis.
While projection as performed by
aspecd.processing.Projection
can be considered a special case of averaging using the whole range of one axis, averaging is usually performed over part of an axis only.Note
Currently, averaging works only for 2D datasets, not for higher-dimensional datasets. This may, however, change in the future.
Important
Indices for the range work slightly different from Python: While still zero-based, a range of [2, 3] will result in having the third and fourth column/row averaged. This seems more intuitive to the average scientist than sticking with Python (and having in this case the third column/row returned).
You can use negative indices as well, as long as the resulting indices are still within the range of the corresponding data dimension.
- parameters
All parameters necessary for this step.
- axis
int
The axis to average along.
Default: 0
- range
list
The range (start, end) to average over.
Default: []
- unit
str
Unit used for specifying the range: either “axis” or “index”.
Default: “index”
- Type:
- axis
- Raises:
ValueError – Raised if range is out of range for given axis or empty Raised if unit is not either “axis” or “index”
IndexError – Raised if axis is out of bounds for given dataset
Changed in version 0.7: Sets dataset label to averaging range (in axes units)
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the averaging with a range only:
- kind: processing type: Averaging properties: parameters: range: [2, 3]
In this case, you will get your dataset averaged along the first axis (index zero), and averaged over the indices 2 and 3 of the second axis.
If you would like to average over the second axis (index 1), just specify this axis:
- kind: processing type: Averaging properties: parameters: range: [2, 3] axis: 1
And as it is sometimes more convenient to give ranges in axis values rather than indices, even this is possible. Suppose the axis you would like to average over runs from 340 to 350, and you would like to average from 342 to 344:
- kind: processing type: Averaging properties: parameters: range: [342, 344] unit: axis
In case of you providing the range in axis units rather than indices, the value closest to the actual axis value will be chosen automatically.
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Averaging is only applicable to datasets with two-dimensional data.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.ScalarAxisAlgebra
Bases:
SingleProcessingStep
Perform scalar algebraic operation on the axis of a dataset.
Sometimes, changing the values of an axis can be quite useful, for example to apply corrections obtained by some analysis step. Usually, this requires scalar algebraic operations on the axis values.
- parameters
All parameters necessary for this step.
- kind
str
Kind of scalar algebra to use
Valid values: “plus”, “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, “+”, “-”, “*”, “/”, “power”, “pow”, “**”
- axis
int
Axis to operate on
Default value: 0
- value
float
Parameter of the scalar algebraic operation
Default value: None
- Type:
- kind
- Raises:
ValueError – Raised if no or wrong kind is provided
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to add a fixed value of 42 to the first axis (index 0) your dataset:
- kind: processing type: ScalarAxisAlgebra properties: parameters: kind: plus axis: 0 value: 42
Similarly, you could use “minus”, “times”, “by”, “add”, “subtract”, “multiply”, “divide”, and “power” as kind - resulting in the given algebraic operation.
New in version 0.2.
- class aspecd.processing.DatasetAlgebra
Bases:
SingleProcessingStep
Perform scalar algebraic operation on two datasets.
To improve the signal-to-noise ratio, adding the data of two datasets can sometimes be useful. Alternatively, adding or subtracting the data of two datasets can be used to help to interpret the signals.
Important
The data of the two datasets to perform the scalar algebraic operation on need to have the same dimension (that is checked for), and to obtain meaningful results, usually the axes values need to be identical as well. For this purpose, use the
CommonRangeExtraction
processing step.Note
Metadata of the dataset are not touched by this operation at all. This means that the metadata in the dataset are still those of the dataset the processing step operated on. This may, however, lead to confusion or misinterpretation if somewhere in the metadata the number of accumulations or measurements per point or similar is encoded.
- parameters
All parameters necessary for this step.
- kind
str
Kind of scalar algebra to use
Valid values: “plus”, “minus”, “add”, “subtract”, “+”, “-”
Note that in contrast to scalar algebra, multiply and divide are not implemented for operation on two datasets.
- dataset
aspecd.dataset.Dataset
Dataset whose data to add or subtract
- Type:
- kind
- Raises:
ValueError – Raised if no or wrong kind is provided Raised if data of datasets have different shapes
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to add the data of the dataset referred to by its label
label_to_other_dataset
to your dataset:- kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: label_to_other_dataset
Similarly, you could use “minus”, “add”, “subtract” as kind - resulting in the given algebraic operation.
As mentioned already, the data of both datasets need to have identical shape, and comparison is only meaningful if the axes are compatible as well. Hence, you will usually want to perform a CommonRangeExtraction processing step before doing algebra with two datasets:
- kind: multiprocessing type: CommonRangeExtraction result: - label_to_dataset - label_to_other_dataset - kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: label_to_other_dataset apply_to: - label_to_dataset
Sometimes, you have recorded multiple datasets and want to add them all up. While technically speaking, this would be possible with consecutive steps, it is much more convenient to provide a list of datasets:
- kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: - label_to_other_dataset - label_to_yet_another_dataset
This will add the data of both datasets provided to the dataset operated upon. Of course, you can subtract the data in the same way.
New in version 0.2.
Changed in version 0.12: Handles a list of datasets in parameter “dataset”.
- class aspecd.processing.Interpolation
Bases:
SingleProcessingStep
Interpolate data.
As soon as data of different datasets should be arithmetically combined, they need to have an identical grid. Often, this can only be achieved by interpolating one or both datasets.
Take care not to use interpolation to artificially smooth your data.
For an in-depth discussion of interpolating ND data, see the following discussions on Stack Overflow, particularly the answers by Joe Kington providing both, theoretical insight and Python code:
Todo
Make type of interpolation controllable
- parameters
All parameters necessary for this step.
- range
list
Range of the axis to interpolate for
Needs to be a list of lists in case of ND datasets with N>1, containing N two-element vectors as ranges for each of the axes.
- npoints
list
Number of points to interpolate for
Needs to be a list in case of ND datasets with N>1, containing N elements, one for each of the axes.
- unit
str
Unit the ranges are given in
Can be either “index” (default) or “axis”.
- Type:
- range
- Raises:
ValueError – Raised if no range to interpolate for is provided. Raised if no number of points to interpolate for is provided. Raised if unit is unknown.
IndexError – Raised if list of ranges does not fit data dimensions. Raised if list of npoints does not fit data dimensions. Raised if given range is out of range of data/axes
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Generally, interpolating requires to provide both, a range and a number of points:
- kind: processing type: Interpolation properties: parameters: range: [10, 100] npoints: 901
This would interpolate your data between their indices 10 and 100 using 901 points. As it is sometimes (often) more convenient to work with axis units, you can tell the processing step to use axis values instead of indices:
- kind: processing type: Interpolation properties: parameters: range: [340, 350] npoints: 1001 unit: axis
This would interpolate your (1D) data between the axis values 340 and 350 using 1001 points.
New in version 0.2.
Changed in version 0.8.3: Interpolation for ND datasets with arbitrary dimension N
Changed in version 0.8.3: Change interpolation method for 2D data from deprecated
scipy.interpolate.interp2d
toscipy.interpolate.RegularGridInterpolator
- class aspecd.processing.Filtering
Bases:
SingleProcessingStep
Filter data.
Generally, filtering is a large field of (digital) signal processing, and currently, this class only implements a very small subset of filters often applied in spectroscopy, namely low-pass filters that can be used for smoothing (“denoising”) data.
Filtering makes heavy use of the
scipy.ndimage
andscipy.signal
modules of the SciPy package. For details, see there.Filtering works with data with arbitrary dimensions, in this case applying the filter in each dimension.
- parameters
All parameters necessary for this step.
- type
str
Type of the filter to use
Currently, three types are supported: “uniform”, “gaussian”, “savitzky-golay”. For convenience, a list of aliases exists for each of these types, and if you use one of these aliases, it will be replaced by its generic name:
Generic
Alias
‘uniform’
‘box’, ‘boxcar’, ‘moving-average’, ‘car’
‘gaussian’
‘binom’, ‘binomial’
‘savitzky-golay’
‘savitzky_golay’, ‘savitzky golay’, ‘savgol’, ‘savitzky’
- window_length
int
Length of the filter window
The window needs to be smaller than the actual data. If you provide a window length that exceeds the data range, an exception will be raised.
- order
int
Polynomial order for the Savitzky-Golay filter
Only necessary for this type of filter. If no order is given for this filter, an exception will be raised.
- Type:
- type
- Raises:
ValueError – Raised if no or wrong filter type is provided. Raised if no filter window is provided. Raised if filter window exceeds data range. Raised in case of Savitzky-Golay filter when no order is provided.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Generally, filtering requires to provide both, a type of filter and a window length. Therefore, for uniform and Gaussian filters, this would be:
- kind: processing type: Filtering properties: parameters: type: uniform window_length: 10
Of course, at least uniform filtering (also known as boxcar or moving average) is strongly discouraged due to the artifacts introduced. Probably the best bet for applying a filter to smooth your data is the Savitzky-Golay filter:
- kind: processing type: Filtering properties: parameters: type: savitzky-golay window_length: 9 order: 3
Note that for this filter, you need to provide the polynomial order as well. To get best results, you will need to experiment with the parameters a bit.
New in version 0.2.
- class aspecd.processing.CommonRangeExtraction
Bases:
MultiProcessingStep
Extract the common range of data for multiple datasets using interpolation.
One prerequisite for adding up multiple datasets in a meaningful way is to have their data dimensions as well as their respective axes values agree. This usually requires interpolating the data to a common set of axes.
Todo
Make type of interpolation controllable
Make number of points controllable (in absolute numbers as well as minimum and maximum points with respect to datasets)
- parameters
All parameters necessary for this step.
- ignore_units
bool
Whether to ignore the axes units when checking the datasets for applicability.
Usually, the axes units should be identical, but sometimes, they may be named differently or be compatible anyways. Use with care and only in case you exactly know what you do
Default: False
- common_range
list
Common range of values for each axis as determined by the processing step.
For >1D datasets, this will be a list of lists.
- npoints
list
Number of points used for the final grid the data are interpolated on.
The length is identical to the dimensions of the data of the datasets.
- Type:
- ignore_units
- Raises:
ValueError – Raised if datasets have axes with different units or disjoint values Raised if datasets have different dimensions
IndexError – Raised if axis is out of bounds for given dataset
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to bring all datasets currently loaded into your recipe to a common range (use with caution, however), things can be as simple as:
- kind: multiprocessing type: CommonRangeExtraction
Note that this will operate on all datasets currently available in your recipe, including results from other processing steps. Therefore, it is usually better to be explicit, using
apply_to
. Otherwise, you can use this processing step early on in your recipe.Usually, however, you will want to restrict this to a subset using
apply_to
and provide labels for the results:- kind: multiprocessing type: CommonRangeExtraction result: - dataset1_cut - dataset2_cut apply_tp: - dataset1 - dataset2
If you want to perform algebraic operations on datasets, the data of both datasets need to have identical shape, and comparison is only meaningful if the axes are compatible as well. Hence, you will usually want to perform a CommonRangeExtraction processing step before doing algebra with two datasets:
- kind: multiprocessing type: CommonRangeExtraction result: - label_to_dataset - label_to_other_dataset - kind: processing type: DatasetAlgebra properties: parameters: kind: plus dataset: label_to_other_dataset apply_to: - label_to_dataset
For details of the algebraic operations on datasets, see
DatasetAlgebra
.New in version 0.2.
Changed in version 0.6.3: Unit of last axis (i.e., intensity) gets ignored when checking for same units
Changed in version 0.9: Works for ND datasets with arbitrary dimension N
- class aspecd.processing.Noise
Bases:
SingleProcessingStep
Add (coloured) noise to data.
Particularly for testing algorithms and hence creating test data, adding noise to these test data is crucial. Furthermore, the naive approach of adding white (Gaussian, normally distributed) noise often does not reflect the physical reality, as “real” noise often has a different power spectral density (PSD).
Probably the kind of noise most often encountered in spectroscopy is 1/f noise or pink noise, with the PSD decreasing by 3 dB per octave or 10 dB per decade. For more details on the different kinds of noise, the following sources may be a good starting point:
Different strategies exist to create coloured noise, and the implementation used here follows basically the ideas published by Timmer and König:
J. Timmer and M. König: On generating power law noise. Astronomy and Astrophysics 300, 707–710 (1995)
In short: In the Fourier space, normally distributed random numbers are drawn for power and phase of each frequency component, and the power scaled by the appropriate power law. Afterwards, the resulting frequency spectrum is back transformed using iFFT and ensuring real data.
Further inspiration came from the following two sources:
Note: The first is based on a MATLAB(R) code by Max Little and contains a number of errors in its Python translation that are not present in the original code.
The added noise has always a mean of (close to) zero.
- parameters
All parameters necessary for this step.
- exponent
float
The exponent used for scaling the power of the frequency components
0 – white (Gaussian) noise -1 – pink (1/f) noise -2 – Brownian (1/f**2) noise
Default: -1 (pink noise)
- normalise
bool
Whether to normalise the noise amplitude prior to adding to the data.
In this case, the amplitude is normalised to 1.
- amplitude
float
Amplitude of the noise
This is often useful to explicitly control the noise level and removes the need to first normalise and scale the data noise should be added to.
- Type:
- exponent
Note
The exponent for the noise is not restricted to integer values, nor to negative values. While for spectroscopic data, pink (1/f) noise usually prevails (exponent = -1), the opposite effect with high frequencies dominating can occur as well. A prominent example of naturally occurring “blue noise” with the density proportional to f is the Cherenkov radiation.
Note
In case of ND data, the coloured noise is calculated along the first dimension only, all other dimensions will exhibit (close to) white (Gaussian) noise. Generally, this should not be a problem in spectroscopy, as usually, data are recorded over time in one dimension only, and only in this (implicit) time dimension coloured noise will be relevant.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Generally, adding noise to a dataset can be quite simple. Without explicitly providing any parameter, 1/f or pink noise will be added to the data:
- kind: processing type: Noise
Of course, you can control in much more detail the kind of noise and its amplitude. To add Gaussian (white) noise to a dataset:
- kind: processing type: Noise properties: parameters: exponent: 0
Similarly, you could add Brownian (1/f**2) noise (with an exponent of -2), but you can give positive exponents as well. While this type of noise is less relevant in spectroscopy, it is relevant in other areas.
To control the noise amplitude, there are two different strategies: normalising the amplitude to one, and providing an explicit amplitude. Normalising works as follows:
- kind: processing type: Noise properties: parameters: normalise: true
Providing an explicit amplitude can be quite helpful in case you want to control the signal-to-noise ratio and know the amplitude of your signal prior to adding noise. Adding noise with a noise amplitude of 0.01 would be done as follows:
- kind: processing type: Noise properties: parameters: amplitude: 0.01
Note that in case you do not provide an exponent, its default value will be used, resulting in pink (1/f) noise, as this is spectroscopically the most relevant.
New in version 0.3.
Changed in version 0.4: Added reference to
references
Changed in version 0.6: Added parameter
amplitude
- class aspecd.processing.ChangeAxesValues
Bases:
SingleProcessingStep
Change values of individual axes.
What sounds pretty much like data manipulation is sometimes a necessity due to the shortcoming of vendor file formats. Let’s face it, but sometimes values read from raw data simply are wrong, due to wrong readout or wrong processing of these parameters within the device. Therefore, it seems much better to transparently change the respective axis values rather than having to modify raw data by hand. Using a processing step has two crucial advantages: (i) it allows for full reproducibility and traceability, and (ii) it can be done in context of recipe-driven data analysis, i.e. not requiring any programming skills.
Note
A real-world example: angular-dependent measurements recorded wrong angles in the raw data file, while the actual positions were correct. Assuming measurements from 0° to 180° in 10° steps, it is pretty straight-forward how to fix this problem: Assign equidistant values from 0° to 180° and use the information about the actual axis length.
- parameters
All parameters necessary for this step.
- range
list
The range of the axis, i.e. start and end value
- axes
list
The axes to set the new values for
Can be an integer in case of a single axis, otherwise a list of integers. If omitted, all axes with values will be assumed (i.e., one per data dimension).
- Type:
- range
- Raises:
IndexError – Raised if index is out of range for axes or given number of axes and ranges is incompatible
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to change the axis range of a 1D dataset, things are as simple as:
- kind: singleprocessing type: ChangeAxesValues properties: parameters: range: [35, 42]
This would take the first axis (index 0) and set the range to linearly spaced data ranging from 35 to 42, of course with the same number of values as before.
If you want to change both axes in a 2D dataset, same here:
- kind: singleprocessing type: ChangeAxesValues properties: parameters: range: - [35, 42] - [17.5, 21]
This would set the range of the first axis (index 0) to the interval [35, 42], and the range of the second axis (index 1) to the interval [17.5, 21].
More often, you may have a 2D dataset where you intend to change the values of only one axis. Suppose the example from above with angular-dependent measurements and the angles in the second dimension:
- kind: singleprocessing type: ChangeAxesValues properties: parameters: range: [0, 180] axes: 1
Here, the second axis (index 1) will be set accordingly.
New in version 0.3.
- class aspecd.processing.RelativeAxis
Bases:
SingleProcessingStep
Create relative axis, centred about a given value.
Sometimes, absolute axis values are less relevant than relative values, particularly if you’re interested in differences in distances between several datasets, e.g. peak positions in spectroscopy.
Note
You can set an origin that is not within the range of the current axis values. In such case, you will see a warning, but as this may be a perfectly valid use case, no exception is thrown.
- parameters
All parameters necessary for this step.
- origin
float
The value the axis should be centred about
This value is subtracted from the original axis values
Default: centre value of the axis range
- axis
int
The index of the axis to be converted into a relative axis
Default: 0
- Type:
- origin
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In case you would like to change the first axis to a relative axis and centre it about its central value, things are as simple as:
- kind: singleprocessing type: RelativeAxis
Of course, this is rarely a sensible use case, and you will usually want to provide a dedicated value for the origin of the new axis (i.e., the axis value the current axis should be centred about).
- kind: singleprocessing type: RelativeAxis properties: parameters: origin: 42
Nothing prevents you from operating on multidimensional datasets, hence converting another axis than the first axis to a relative one. For making the second axis (with index 1) a relative axis, do something like that:
- kind: singleprocessing type: RelativeAxis properties: parameters: origin: 42 axis: 1
New in version 0.8.
- class aspecd.processing.SliceRearrangement
Bases:
SingleProcessingStep
Rearrange slices of a dataset along one dimension.
With multidimensional datasets, there is sometimes the need to rearrange the slices. Suppose you have a dataset containing original data, individual fitted components and the sum of the fitted components. Tasks operating on such dataset typically expect the individual slices in a certain sequence. If, however, the datasets originate from an external source that has a different sorting, you can use this step to rearrange the slices accordingly.
You can either provide indices or axis values for
positions
. For the latter, set the parameter “unit” accordingly. For details, see below.If you provide less positions than slices along this dimension exist, the remaining slices not covered in the “positions” parameter are appended to the positions list.
If you provide the same position for a slice several times, the slice will be repeatedly inserted into the dataset, resulting in a larger dataset along the given axis dimension than before.
- parameters
All parameters necessary for this step.
- axis
int
Index of the axis to take the position from to rearrange the slices
If an invalid axis is provided, an IndexError is raised.
Default: 0
- positions
list
Positions and intended order of the slices to rearrange
Positions can be given as axis indices (default) or axis values, if the parameter “unit” is set accordingly. For details, see below.
If no position is provided or the given position is out of bounds for the given axis, a ValueError is raised.
If fewer positions are provided than present in this dimension, the remaining positions are simply appended to the list.
If a position is given more than once, the corresponding slice is introduced multiple times and the dataset enlarged along the given dimension/axis.
- unit
str
Unit used for specifying the positions: either “axis” or “index”.
If an invalid value is provided, a ValueError is raised.
Default: “index”
- Type:
- axis
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset has not enough dimensions (i.e., 1D dataset).
ValueError – Raised if index is out of bounds for given axis. Raised if wrong unit is given.
IndexError – Raised if axis is out of bounds for given dataset.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the slice rearrangement with a list of positions only:
- kind: processing type: SliceRearrangement properties: parameters: positions: [1, 0, 4, 2, 3]
This will rearrange the slices along the first axis (index zero) in the given sequence.
Typically, with 2D datasets, you will want to rearrange along the second axis:
- kind: processing type: SliceRearrangement properties: parameters: axis: 1 positions: [1, 0, 4, 2, 3]
This will rearrange the slices along the second axis (index one) in the given sequence.
Suppose you have a dataset with ten slices along the second dimension, but you only care about the first three positions and want them to appear in reverse order:
- kind: processing type: SliceRearrangement properties: parameters: axis: 1 positions: [2, 1, 0]
This will reverse the first three slices along the second dimension (with index one), but keep the overall shape of the dataset.
What happens if you provide one slice several times? The slice is inserted several times into your dataset, thus expanding the dataset along the given axis dimension:
- kind: processing type: SliceRearrangement properties: parameters: axis: 1 positions: [1, 0, 1]
This will add the second slice of the original dataset at the first and third position (indices 0 and 2, respectively), thus expanding your dataset by one slice along the given dimension.
New in version 0.12.
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
Slice extraction is only applicable to datasets with at least two-dimensional data.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.processing.Denoising1DSVD
Bases:
SingleProcessingStep
Denoise 1D data using singular value decomposition (SVD).
SVD has been shown to be a powerful method for denoising data and is used in several slightly different ways, mostly for image or more general 2D data denoising. To use SVD for denoising 1D data, one first needs to create a 2D matrix from the original data. One way is to create a (partial) circulant matrix or some variant thereof.
Being a non-parametric method for denoising, basically no assumptions on the shape of the actual signal are necessary. This is one of the big advantages over other methods such as filtering (see
Filtering
for details): Accidental distortions of the signal are very unlikely.To avoid ringing artifacts at the ends of the reconstructed signal, an adaptive intermediate detrending is performed as well.
The algorithm implemented here is based on:
X. C. Chen, Yu. A. Litvinov, M. Wang, Q. Wang, and Y. H. Zhang: Denoising scheme based on singular-value decomposition for one-dimensional spectra and its application in precision storage-ring mass spectrometry. Physical Review E 99, 063320 (2019)
Chen, X.: (2019). A generic denoising method for 1D spectra based on singular value decomposition (v2.1). Zenodo. https://doi.org/10.5281/zenodo.2603558
Hence, if using this code leads to a scientific publication, strongly consider citing the appropriate publication(s).
- parameters
All parameters necessary for this step.
- rank
int
Rank of the approximating matrix of the constructed partial circulant matrix from the sequence. The rank will automatically be determined by the algorithm. Hence, this parameter is read-only. For details of the algorithm, see the cited reference.
- fraction
float
Fraction of the data length used as rows of the constructed matrix.
Sensible values are in the interval [0.1…0.4]*n, with n the size of the data vector.
Larger values than 0.4 are unnecessary, and generally smaller values will speed up the process, as the matrix to be constructed is smaller. Furthermore, it seems that larger matrices not necessarily result in better denoising. For details, see the cited reference.
Default: 0.2
- noise_threshold
float
Threshold below which the singular components are considered noise.
Noise components are detected using the normalized mean total variation of the left singular vectors as an indicator.
Default: 0.1
- Type:
- rank
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if dataset is not 1D or has <=10 data points.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.In the simplest case, just invoke the denoising without any further parameters:
- kind: processing type: Denoising1DSVD
If you ever want to change some of the (few) available parameters, e.g., the size of the constructed matrix in fractions of the signal length, this is of course possible as well:
- kind: processing type: Denoising1DSVD properties: parameters: fraction: 0.3
Note, however, that enlarging the size of the constructed partial circulant matrix does not necessarily provide better results and usually slows down processing.
New in version 0.12.
- static applicable(dataset)
Check whether processing step is applicable to the given dataset.
This method is only applicable to 1D datasets with >10 data points.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type: