You're reading the documentation for a development version. For the latest released version, please have a look at v0.11.
aspecd.analysis module
Data analysis functionality.
Key to reproducible science is automatic documentation of each analysis step applied to the data of a dataset. Such an analysis step each is self-contained, meaning it contains every necessary information to perform the analysis task on a given dataset.
Analysis steps, in contrast to processing steps (see
aspecd.processing
for details), operate on data of a
aspecd.dataset.Dataset
, but don’t change its data. Rather,
some result is obtained that is stored separately, together with the
parameters of the analysis step, in the
aspecd.dataset.Dataset.analyses
attribute of the dataset.
Generally, three types of analysis steps can be distinguished:
Analysis steps for handling single datasets
Shall be derived from
aspecd.analysis.SingleAnalysisStep
.Analysis steps for handling multiple datasets
Shall be derived from
aspecd.analysis.MultiAnalysisStep
.Analysis steps aggregating the results of a SingleAnalysisStep for multiple datasets
Handled by class
aspecd.analysis.AggregatedAnalysisStep
.
In the first case, the analysis is usually handled using the
analyse()
method of the respective aspecd.dataset.Dataset
object. Additionally, those analysis steps always only operate on the data
of a single dataset. Analysis steps handling single datasets should always
inherit from the aspecd.analysis.SingleAnalysisStep
class.
In the second case, the analysis step is handled using the analyse()
method of the aspecd.analysis.AnalysisStep
object, and the datasets
are stored as a list within the analysis step. As these analysis steps span
several datasets. Analysis steps handling multiple datasets should
always inherit from the aspecd.analysis.MultiAnalysisStep
class.
Performing a SingleAnalysisStep on multiple datasets and aggregating the
results in a aspecd.dataset.CalculatedDataset
is the realm of the
third type of analysis steps, aspecd.analysis.AggregatedAnalysisStep
.
The module contains both, base classes for analysis steps (as detailed above) as well as a series of generally applicable analysis steps for all kinds of spectroscopic data. The latter are an attempt to relieve the developers of packages derived from the ASpecD framework from the task to reinvent the wheel over and over again.
The next section gives an overview of the concrete analysis steps implemented within the ASpecD framework. For details of how to implement your own analysis steps, see the section below.
Concrete analysis steps
Besides providing the basis for analysis steps for the ASpecD framework, ensuring full reproducibility and traceability, hence reproducible science and good scientific practice, this module comes with a (growing) number of general-purpose analysis steps useful for basically all kinds of spectroscopic data.
Here is a list as a first overview. For details, see the detailed documentation of each of the classes, readily accessible by the link.
Analysis steps operating on individual datasets
The following analysis steps operate each on individual datasets independently.
-
Extract basic characteristics of a dataset
-
Extract basic statistical measures of a dataset
-
Blind, i.e. parameter-free, estimation of the signal-to-noise ratio
-
Find peaks in 1D datasets
-
Calculate power density spectrum of 1D dataset, useful, e.g. for analysing the statistics of noise (i.e., its colour)
-
Perform polynomial fit on 1D data
LinearRegressionWithFixedIntercept
Perform linear regression without fitting the intercept on 1D data. Note that this is mathematically different from a polynomial fit of first order.
-
Extract device data as separate dataset.
Datasets may contain additional data as device data in
aspecd.dataset.Dataset.device_data
. For details, seeaspecd.dataset.DeviceData
. To further process and analyse these device data, the most general way is to extract them as individual dataset and perform all further tasks on it. -
Calculate centre of mass for ND datasets.
Writing own analysis steps
Each real analysis step should inherit from either
aspecd.analysis.SingleAnalysisStep
in case of operating on a
single dataset only or from aspecd.analysis.MultiAnalysisStep
in
case of operating on several datasets at once. Furthermore, all analysis
steps should be contained in one module named “analysis”. This allows for
easy automation and replay of analysis steps, particularly in context of
recipe-driven data analysis (for details, see the aspecd.tasks
module).
General advice
A few hints on writing own analysis step classes:
Always inherit from
aspecd.analysis.SingleAnalysisStep
oraspecd.analysis.MultiAnalysisStep
, depending on your needs.Store all parameters, implicit and explicit, in the dict
parameters
of theaspecd.analysis.AnalysisStep
class, not in separate properties of the class. Only this way, you can ensure full reproducibility and compatibility of recipe-driven data analysis (for details of the latter, see theaspecd.tasks
module). Additionally, this way, if you return a (calculated) dataset, these parameters get automatically added to the metadata of the calculated dataset.Always set the
description
property to a sensible value. Be as concise as possible. The first line of the class docstring may be a good inspiration.Implement the actual analysis in the
_perform_task
method of the analysis step. For sanitising parameters and checking general applicability of the analysis step to the dataset(s) at hand, continue reading.Make sure to implement the
aspecd.analysis.AnalysisStep.applicable()
method according to your needs. Typical cases would be to check for the dimensionality of the underlying data, as some analysis steps may work only for 1D data (or vice versa). Don’t forget to declare this as a static method, using the@staticmethod
decorator.With the
_sanitise_parameters
method, the input parameters are automatically checked and an appropriate exception can be thrown in order to describe the error source to the user.
Some more special cases are detailed below. For further advice, consult the source code of this module, and have a look at the concrete processing steps whose purpose is described below in more detail.
Adding parameters upon analysis
Sometimes there is the need to persist values that are only obtained during
analysis of the data. These parameters should end up in the
aspecd.analysis.AnalysisStep.parameters
dictionary. Thus,
they are added to the dataset history and available for reports and alike.
Changing the length of your data
When changing the length of the data, always change the corresponding axes values first, and only afterwards the data, as changing the data will change the axes values and adjust their length to the length of the corresponding dimension of the data.
Returning calculated datasets as result
The type of the attribute aspecd.analysis.AnalysisStep.result
depends strongly on the specific analysis step. Sometimes, a calculated
dataset will be returned. A typical example is
aspecd.analysis.PeakFinding
, where you can explicitly ask for a
calculated dataset to be returned and use this result later for plotting
both, original data and detected peaks overlaid. To have the minimal
metadata of the calculated dataset set correctly, use the method
aspecd.analysis.AnalysisStep.create_dataset()
to obtain the calculated
dataset. This will set both, type of calculation (to the full class name of
the analysis step) and parameters. Of course, you are solely responsible to
set the data and axes values (and further metadata, if applicable).
Module documentation
- class aspecd.analysis.AnalysisStep
Bases:
ToDictMixin
Base class for analysis steps.
Analysis steps, in contrast to processing steps (see
aspecd.processing
for details), operate on data of aaspecd.dataset.Dataset
, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in theaspecd.dataset.Dataset.analyses
attribute of the dataset and can be found in theaspecd.analysis.AnalysisStep.result
attribute.In case
aspecd.analysis.AnalysisStep.result
is a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset
), and the idea behind storing the result in form of a dataset is to be able to plot and further process these results in a fully generic manner. To create such a calculated dataset, use the methodcreate_dataset()
that will automatically set minimal metadata for you.The actual implementation of the analysis step is done in the private method
_perform_task()
that in turn gets called byanalyse()
which is called by theaspecd.dataset.Dataset.analyse()
method of the dataset object.Note
Usually, you will never implement an instance of this class for actual analysis tasks, but rather one of the child classes, namely
aspecd.analysis.SingleAnalysisStep
andaspecd.analysis.MultiAnalysisStep
, depending on whether your analysis step operates on a single dataset or requires multiple datasets.- parameters
Parameters required for performing the analysis step
All parameters, implicit and explicit.
- Type:
- result
Results of the analysis step
Can be either a
aspecd.dataset.Dataset
or some other class, e.g.,aspecd.metadata.PhysicalQuantity
.In case of a dataset, it is a calculated dataset (
aspecd.dataset.CalculatedDataset
)
- index
Label for each element in
result
Should only be set if
result
is a scalar or list.The index will be used, e.g., by
AggregatedAnalysisStep
and in tabular representations of the results.Added in version 0.5.
- Type:
- dataset_type
Full class name of the dataset that should be created
In case of returning a calculated dataset, packages derived from the ASpecD framework may want to return their own instances of
aspecd.dataset.CalculatedDataset
.Note that due to assigning some metadata, the class specified here needs to conform to
aspecd.dataset.CalculatedDataset
.Default: “aspecd.dataset.CalculatedDataset”
Added in version 0.7.
- Type:
- references
List of references with relevance for the implementation of the processing step.
Use appropriate record types from the bibrecord package.
Added in version 0.4.
- Type:
- Raises:
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
- analyse()
Perform the actual analysis step on the given dataset.
The actual analysis step should be implemented within the non-public method
_perform_task()
. Besides that, the applicability of the analysis step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method_sanitise_parameters()
.
- analyze()
Perform the actual analysis step on the given dataset.
Same method as self.analyse, but for those preferring AE over BE
- create_dataset()
Create calculated dataset containing minimal metadata.
The following metadata are set:
Metadata
Value
calculation.type
calculation.parameters
- Returns:
dataset – (Calculated) dataset containing minimal metadata.
- Return type:
Added in version 0.2.
- static applicable(dataset)
Check whether analysis step is applicable to the given dataset.
Returns True by default and needs to be implemented in classes inheriting from SingleAnalysisStep according to their needs.
This is a static method that gets called automatically by each class inheriting from
aspecd.analysis.AnalysisStep
. Hence, if you need to override it in your own class, make the method static as well. An example of an implementation testing for two-dimensional data is given below:@staticmethod def applicable(dataset): return len(dataset.data.axes) == 3
- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to check- Returns:
applicable – True if successful, False otherwise.
- Return type:
- class aspecd.analysis.SingleAnalysisStep
Bases:
AnalysisStep
Base class for analysis steps operating on single datasets.
Analysis steps, in contrast to processing steps (see
aspecd.processing
for details), operate on data of aaspecd.dataset.Dataset
, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in theaspecd.dataset.Dataset.analyses
attribute of the dataset and can be found in theaspecd.analysis.SingleAnalysisStep.result
attribute.In case
aspecd.analysis.SingleAnalysisStep.result
is a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset
), and the idea behind storing the result in form of a dataset is to be able to plot and further process these results in a fully generic manner.The actual implementation of the analysis step is done in the private method
_perform_task()
that in turn gets called byanalyse()
which is called by theaspecd.dataset.Dataset.analyse()
method of the dataset object.- dataset
Dataset the analysis step should be performed on
- Type:
- Raises:
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
- analyse(dataset=None, from_dataset=False)
Perform the actual analysis step on the given dataset.
If no dataset is provided at method call, but is set as property in the SingleAnalysisStep object, the analyse method of the dataset will be called and thus the analysis added to the list of analyses of the dataset.
If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The
aspecd.dataset.Dataset
object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within theaspecd.analysis.SingleAnalysisStep
object is not necessary.The actual analysis step should be implemented within the non-public method
_perform_task()
. Besides that, the applicability of the analysis step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method_sanitise_parameters()
.Additionally, each dataset will be automatically checked for applicability, using the
aspecd.analysis.AnalysisStep.applicable()
method. Make sure to override this method according to your needs.- Parameters:
dataset (
aspecd.dataset.Dataset
) – dataset to perform analysis forfrom_dataset (boolean) –
whether we are called from within a dataset
Defaults to “False” and shall never be set manually.
- Returns:
dataset – dataset analysis has been performed for
- Return type:
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised when analysis step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on
- analyze(dataset=None, from_dataset=False)
Perform the actual analysis step on the given dataset.
Same method as self.analyse, but for those preferring AE over BE
- add_preprocessing_step(processingstep=None)
Add a preprocessing step to the internal list.
Some analyses need some preprocessing of the data. These preprocessing steps are contained in the
preprocessing
attribute.- Parameters:
processingstep (
aspecd.processing.ProcessingStep
) – processing step to be added to the list of preprocessing steps
- create_history_record()
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.analyse()
method of theaspecd.dataset.Dataset
class and ensures the history of each analysis step to get written properly.- Returns:
history_record – history record for analysis step
- Return type:
- class aspecd.analysis.MultiAnalysisStep
Bases:
AnalysisStep
Base class for analysis steps operating on multiple datasets.
Analysis steps, in contrast to processing steps (see
aspecd.processing
for details), operate on data of aaspecd.dataset.Dataset
, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in theaspecd.dataset.Dataset.analyses
attribute of the dataset and can be found in theaspecd.analysis.MultiAnalysisStep.result
attribute.The actual implementation of the analysis step is done in the private method
_perform_task()
that in turn gets called byanalyse()
.- analyse()
Perform the actual analysis on the given list of datasets.
If no dataset is added to the list of datasets of the object, the method will raise a respective exception.
The actual analysis step should be implemented within the non-public method
_perform_task()
. Besides that, the parameters will be sanitised by calling the non-public method_sanitise_parameters()
.Additionally, each dataset will be automatically checked for applicability, using the
aspecd.analysis.AnalysisStep.applicable()
method. Make sure to override this method according to your needs.- Raises:
aspecd.exceptions.MissingDatasetError – Raised when no datasets exist to act on
aspecd.exceptions.NotApplicableToDatasetError – Raised when analysis step is not applicable to dataset
- class aspecd.analysis.AggregatedAnalysisStep
Bases:
AnalysisStep
Perform a SingleAnalysisStep on multiple datasets and aggregate results.
Data analysis often involves performing one and the same analysis step on a series of datasets and aggregate the results in a single (calculated) dataset for further display, be it graphically or tabularly.
- analysis_step
Name of the analysis step to perform on the datasets
Should be a class name of an analysis step. Can be either a full class name including package and module, or only the class name. In the latter case, it will be looked up in ‘aspecd.analysis’.
- Type:
- result
Result of the aggregated analysis
- Raises:
aspecd.exceptions.MissingDatasetError – Raised if no datasets are given
aspecd.exceptions.MissingAnalysisStepError – Raised if no analysis_step is given
ValueError – Raised if the actual AnalysisStep returns a dataset
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Let’s assume that you want to extract the minima of a series of datasets:
- kind: aggregatedanalysis type: BasicCharacteristics properties: parameters: kind: min apply_to: - dataset1 - dataset2 result: basic_characteristics
If the analysis step is from another package, the full class name needs to be provided:
- kind: aggregatedanalysis type: package.module.AnalysisClass properties: parameters: kind: foo apply_to: - dataset1 - dataset2 result: my_analysis
Added in version 0.5.
- analyse()
Perform the given analysis step on the list of datasets.
The name of the analysis step to be performed on the list of datasets is provided in
analysis_step
. The analysis will result in aaspecd.dataset.CalculatedDataset
with the metadata regarding the calculation (type and parameters) set accordingly toanalysis_step
andparameters
.
- class aspecd.analysis.BasicCharacteristics
Bases:
SingleAnalysisStep
Extract basic characteristics of a dataset.
Extracting basic characteristics (minimum, maximum, area, amplitude) of a dataset is programmatically quite simple. This class provides a working solution from within the ASpecD framework.
- parameters
All parameters necessary for this step.
- kind
str
Kind of the characteristic to extract from the data
Valid values are “min”, “max”, “amplitude”, and “area”.
A special kind is “all”, returning all characteristics. In this case, output can only be “value” (the default).
- output
str
Kind of output: (intensity) value, axes value(s), or axes indices
Valid values are “value” (default), “axes”, and “indices”. For amplitude and area, as these characteristics have no analogon on the axes, only “value” is a valid output option.
Default: “value”
- Type:
- kind
- result
Characteristic(s) of the dataset.
The actual return type depends on the type of characteristics and output selected.
kind (characteristic)
output
return type
min, max, amplitude, area
value
min, max
axes, indices
all
value
The corresponding kind is set as index, hence it will be used by
AggregatedAnalysisStep
and included in the dataset output by this step - and hence in tabular output created byaspecd.table.Table
.
- index
Label for each element in
result
Always reflecting the kind of characteristic(s) asked for. In case of asking for values, it is a list of the kinds. In case of the output set to axes or indices, it will be the kind followed by either the axis quantity or “index#”.
Assuming a 2D dataset with axes quantities set to “wavelength” and “time” and asking for the minimum, the index will be
['min(wavelength)', 'min(time)']
in case of output set to axes, and['min(index0)', 'min(index1)']
in case of output set to indices.The index will be used, e.g., by
AggregatedAnalysisStep
and in tabular representations of the results.- Type:
- Raises:
ValueError – Raised if no kind of characteristics is provided. Raised if kind of characteristics is unknown. Raised if output type is unknown. Raised if output type is not available for kind of characteristics.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Extracting the characteristic of a dataset is quite simple:
- kind: singleanalysis type: BasicCharacteristics properties: parameters: kind: min result: min_of_dataset
This would simply return the minimum (value) of a given dataset in the result assigned to the recipe-internal variable
min_of_dataset
. Similarly, you can extract “max”, “area”, and “amplitude” from your dataset. In case you are interested in the axes values or indices, set the output parameter appropriately:- kind: singleanalysis type: BasicCharacteristics properties: parameters: kind: min output: axes result: min_of_dataset
In this particular case, this would return the axes values of the global minimum of your dataset in the result. Note that those other output types are only available for “min” and “max”, as “area” and “amplitude” have no analogon on the axes.
Sometimes, you are interested in getting the values of all characteristics at once in form of a list, with the kind stored in
index
:- kind: singleanalysis type: BasicCharacteristics properties: parameters: kind: all result: characteristics_of_dataset
Make sure to understand the different types the result has depending on the characteristic and output type chosen. For details, see the table above.
Added in version 0.2.
Changed in version 0.5:
result
is either scalar or list in all cases, andindex
is set to kind, for use withAggregatedAnalysisStep
and tabular output.
- class aspecd.analysis.BasicStatistics
Bases:
SingleAnalysisStep
Extract basic statistical measures of a dataset.
Extracting basic statistical measures (mean, median, std, var) of a dataset is programmatically quite simple. This class provides a working solution from within the ASpecD framework.
- parameters
All parameters necessary for this step.
- kind
str
Kind of the statistical measure to extract from the data
Valid values are “mean”, “median”, “std”, and “var”.
- Type:
- kind
- Raises:
ValueError – Raised if no kind of statistical measure is provided. Raised if kind of statistical measure is unknown.
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Extracting the statistical measure of a dataset is quite simple:
- kind: singleanalysis type: BasicStatistics properties: parameters: type: median result: median_of_dataset
This would simply return the median of the data of a given dataset in the result assigned to the recipe-internal variable
median_of_dataset
. Similarly, you can extract “mean”, “std” (standard deviation), and “var” (variance) from your dataset.Added in version 0.2.
- class aspecd.analysis.BlindSNREstimation
Bases:
SingleAnalysisStep
Blind, i.e. parameter-free, estimation of the signal-to-noise ratio.
In spectroscopy, the signal-to-noise ratio (SNR) is usually defined as the ratio of mean (of the signal) to standard deviation (of the noise) of a signal or measurement.
For accurate estimations, this requires to be able to separate noise and signal, hence to define a part of the overall measurement not including signal. As this is not always possible, there are different ways to make a blind estimate of the SNR, i.e. without additional parameters.
The simplest possible approach of a blind estimate is the ratio of mean to standard deviation of the whole signal (method
simple
):\[\mbox{SNR} = \frac{\mu}{\sigma}\]An alternative version sometimes used is to take the suqare of both, mean and standard deviation (method
simple_squared
):\[\mbox{SNR} = \frac{\mu^2}{\sigma^2}\]This is equivalent to the more common definition using the ratio of the (average) power of signal and noise.
Yet another algorithm, the “DER_SNR” algorithm proposed by Stoehr et al. for use in astronomic data (for details see Czesla et al., 2018, details below) makes use of the median and a numeric second derivative (method
der_snr
):\[ \begin{align}\begin{aligned}\mbox{SNR} &= \mbox{med} / \sigma\\\sigma &=\frac{1.482602}{\sqrt{6}}\mbox{med}_i(|-x_{i-2}+2x_i-x_{ i+2}|)\end{aligned}\end{align} \]Other options would be to fit a polynomial to the data, subtract the fitted polynomial and estimate the noise this way. A Savitzky-Golay filter could be used for this.
An article dealing with SNR estimation for (astronomic) spectral data that provides a lot of details is:
S. Czesla, T. Molle, and J. H. M. M. Schmitt: A posteriori noise estimation in variable data sets. With applications to spectra and light curves. Astronomy and Astrophysics 609(2018):A39. https://doi.org/10.1051/0004-6361/201730618
For more information, the following resources may as well be useful:
Important
While all methods currently implemented are “parameter-free”, the estimates are based on a number of assumptions, the most important being normally distributed noise. Furthermore, your data need to be sampled appropriately, with the highest frequency component of your signal being well resolved.
- parameters
All parameters necessary for this step.
- method
str
Method used to blindly estimate the SNR
Valid values are “simple”, “simple_squared”, “der_snr”.
Default: “simple”
- Type:
- method
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Obtaining a blind estimate of the SNR of a dataset is quite simple:
- kind: singleanalysis type: BlindSNREstimation result: SNR_of_dataset
This would simply return the SNR of the data of a given dataset in the result assigned to the recipe-internal variable
SNR_of_dataset
.To have more control over the method used to blindly estimate the SNR, explicitly provide a method:
- kind: singleanalysis type: BlindSNREstimation properties: parameters: method: der_snr result: SNR_of_dataset
This would use the DER_SNR method as described above.
Added in version 0.2.
Changed in version 0.3: Added methods: “simple_squared”, “der_snr”
- class aspecd.analysis.PeakFinding
Bases:
SingleAnalysisStep
Peak finding in one dimension.
Finding peaks is a use case often encountered in analysing spectroscopic data, but it is far from trivial and usually requires careful choosing of parameters to yield sensible results.
The peak finding relies on the
scipy.signal.find_peaks()
function, hence you can set most of the parameters this function understands. For details of the parameters, see as well the SciPy documentation.Important
Peak finding can only be applied to 1D datasets, due to the underlying algorithm.
- parameters
All parameters necessary for this step.
- negative_peaks
bool
Whether to include negative peaks in peak finding as well.
Negative peaks are searched for by inverting the sign of the signal, and the list of peak positions is sorted.
Default: False
- return_properties
bool
Whether to return properties together with the peak positions.
If properties shall be returned as well, the attribute
result
will be a tuple containing the list of peak positions as first element and a dictionary with peak properties as second element.Note: If negative peaks shall be returned as well, this option will be silently ignored and only the peak positions returned.
Default: False
- return_dataset
bool
Whether to return a calculated dataset as result.
In this case, the result will be an object of class
aspecd.dataset.CalculatedDataset
, with the data containing the peak intensities and the corresponding axis values the peak positions. Thus, this can be used to plot the peaks on top of the original data.Default: False
- return_intensities
bool
Whether to return both, peak positions and intensities.
In this case, the result will be a 2D numpy array with the peak positions in the first and the peak intensities in the second column. This can be used, e.g., in annotations to mark the peak positions.
Default: False
Added in version 0.11.
- heightnumber or ndarray or sequence
Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.
Default: None
- thresholdnumber or ndarray or sequence
Required threshold of peaks, the vertical distance to its neighboring samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required threshold.
Default: None
- distancenumber
Required minimal horizontal distance (>= 1) in samples between neighbouring peaks. Smaller peaks are removed first until the condition is fulfilled for all remaining peaks.
Default: None
- prominencenumber or ndarray or sequence
Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence.
Default: None
- widthnumber or ndarray or sequence
Required width of peaks in samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required width.
Default: None
- Type:
- negative_peaks
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Finding the peak positions of a basically noise-free dataset is quite simple:
- kind: singleanalysis type: PeakFinding result: peaks
This would simply return the peak positions of the data of a given dataset in the result assigned to the recipe-internal variable
peaks
.To have more control over the method used to find peaks, you can set a number of parameters. To get the negative peaks as well (normally, only positive peaks will be looked for):
- kind: singleanalysis type: PeakFinding properties: parameters: negative_peaks: True result: peaks
Sometimes it is convenient to have the peaks returned as a dataset, to plot the data and highlight the peaks found:
- kind: singleanalysis type: PeakFinding properties: parameters: return_dataset: True result: peaks
From the options that can be set for the function
scipy.signal.find_peaks()
, you can set “height”, “threshold”, “distance”, “prominence”, and “width”. For details, see the SciPy documentation.For noisy data, “prominence” can be a good option to only find “true” peaks:
- kind: singleanalysis type: PeakFinding properties: parameters: prominence: 0.2 result: peaks
If you supply one of these additional options, you might be interested not only in the peak positions, but in the properties of the peaks found as well.
- kind: singleanalysis type: PeakFinding properties: parameters: prominence: 0.2 return_properties: True result: peaks
In this case, the result, here stored in the variable “peaks”, will be a tuple with the peak positions as first element and a dictionary with properties as the second element. Note that if you ask for negative peaks as well, this option will silently be ignored and only the peak positions returned.
Added in version 0.2.
- static applicable(dataset)
Check whether analysis step is applicable to the given dataset.
Peak finding can only be applied to 1D datasets.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – Dataset to check- Returns:
applicable – Whether dataset is applicable
- Return type:
- class aspecd.analysis.PowerDensitySpectrum
Bases:
SingleAnalysisStep
Calculate power density spectrum of given 1D dataset.
The power density spectrum is the log10 of the power for each frequency component as function of the log10 of the frequency. For mathematical reasons, the power of the DC component (f = 0) is omitted.
The power density spectrum (sometimes called power spectral density, PSD) can be used to analyse the nature of noise, e.g. whether it is Gaussian (white, normally distributed) noise or “coloured” noise with the frequencies of the noise components differently weighted.
In spectroscopy, often coloured noise (most frequently pink or 1/f noise) rather than white noise is encountered. The characteristics of white noise is an equal distribution of all frequencies, related to a constant in the power density spectrum. Pink or 1/f noise, in contrast, exhibits a linear damping of higher frequencies with a slope of -1 in the power density spectrum. For more details regarding noise in spectroscopy, the interested reader is referred to the documentation of the
aspecd.processing.Noise
class.- result
power density spectrum of the corresponding 1D dataset analysed
- parameters
All parameters necessary for this step.
- method
str
Method to use to calculate the power density spectrum
Possible methods must exist in the
scipy.signal
module. Currently, you can choose between “periodogram” and “welch”. See their respective documentation, i.e.scipy.signal.periodogram()
andscipy.signal.welch()
for details.Default: periodogram
- Type:
- method
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if applied to a ND dataset (with N>1)
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Computing the power density spectrum of a 1D dataset (let’s assume you use a trace of pure noise for this) is quite simple:
- kind: singleanalysis type: PowerDensitySpectrum result: power_density_spectrum
This would simply return the power density spectrum of the data of a given dataset in the result assigned to the recipe-internal variable
power_density_spectrum
. Note that the result is itself a calculated dataset, hence you can easily plot it for graphical representation and manual inspection:- kind: singleplot type: SinglePlotter1D properties: filename: power_density_spectrum.pdf apply_to: power_density_spectrum
You may even want to calculate the power density spectrum, perform a polynomial fit (of first order), evaluate the polynomial for the coefficients obtained by the fit, and plot both together in one figure:
- kind: singleanalysis type: PowerDensitySpectrum result: power_density_spectrum - kind: singleanalysis type: PolynomialFit result: coefficients apply_to: power_density_spectrum - kind: model type: Polynomial properties: parameters: coefficients: coefficients from_dataset: power_density_spectrum result: linear_fit - kind: multiplot type: MultiPlotter1D properties: filename: power_density_spectrum.pdf apply_to: - power_density_spectrum - linear_fit
To have more control over the method used to calculate the power density spectrum, you can explicitly provide a method name here:
- kind: singleanalysis type: PowerDensitySpectrum properties: parameters: method: welch result: power_density_spectrum
Note that the methods need to reside in the
scipy.signal
module.Added in version 0.3.
- static applicable(dataset)
Check whether analysis step is applicable to the given dataset.
Power density spectrum calculation can only be applied to 1D datasets.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – Dataset to check- Returns:
applicable – Whether dataset is applicable
- Return type:
- class aspecd.analysis.PolynomialFit
Bases:
SingleAnalysisStep
Perform polynomial fit on 1D data.
The coefficients obtained can be used to evaluate a model that can be plotted together with the data the polynomial has been fitted to originally. At the same time, you can tabulate the coefficients.
- result
coefficients of the fitted polynomial in increasing order
As the new
numpy.polynomial
package is used, particularly thenumpy.polynomial.polynomial.Polynomial
class, the coefficients are given in increasing order, with the first element corresponding to x**0. Furthermore, the coefficients are given in the unscaled data domain (using thenumpy.polynomial.polynomial.Polynomial.convert()
method).- Type:
- parameters
All parameters necessary for this step.
- order
int
Order (degree) of the polynomial to be fitted to the data
Default: 1
- Type:
- order
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if applied to a ND dataset (with N>1)
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Fitting a polynomial of first order to your 1D dataset is quite simple:
- kind: singleanalysis type: PolynomialFit result: polynomial_coefficients_1st_order
This would simply return the polynomial coefficients (in increasing order) as a list in the result assigned to the recipe-internal variable
polynomial_coefficients_1st_order
.If you would like to fit a polynomial of different order, simply provide the desired order as an additional parameter:
- kind: singleanalysis type: PolynomialFit properties: parameters: order: 3 result: polynomial_coefficients_3rd_order
In this case, the result will contain a list of four coefficients of the fitted polynomial of third order, again in increasing order.
Added in version 0.3.
- static applicable(dataset)
Check whether analysis step is applicable to the given dataset.
Polynomial fits can (currently) only be applied to 1D datasets.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – Dataset to check- Returns:
applicable – Whether dataset is applicable
- Return type:
- class aspecd.analysis.LinearRegressionWithFixedIntercept
Bases:
SingleAnalysisStep
Perform linear regression with fixed intercept on 1D data
In contrast to a regular polynomial fit of first order, where two parameters (slope and intercept) are fitted, there are mathematical models where the intercept is fixed, i.e. not to be fitted as well. In these cases, using a polynomial fit of first order is simply wrong. Of course, which of these approaches is valid depends on the underlying model and the physical reality to be modelled.
Note
A prime example of a linear model with only a slope and no intercept is Hooke’s law stating that the force F needed to extend or compress a spring by some distance x scales linearly with respect to that distance, i.e., F = kx. Here, k is the characteristic of the spring, sometimes called “spring constant”.
The approach taken here is to use linear algebra and solve the system of equations by calling
numpy.linalg.lstsq()
. In case of a vertical offset (i.e., intercept not zero), the offset is first subtracted from the function values and afterwards the regression performed.- result
slope of the linear regression
If you set the parameter
polynomial_coefficients
to True, a list with (fixed) intercept and (fitted) slope will be returned (see below).- Type:
- parameters
All parameters necessary for this step.
- offset
float
Vertical offset of the data, i.e. f(0)
Useful in cases where the model defines an intercept f(0) != 0.
Default: 0
- polynomial_coefficients
bool
Whether to return both, intercept and slope for compatibility with polynomial model,
aspecd.model.Polynomial
Default: False
- Type:
- offset
- Raises:
aspecd.exceptions.NotApplicableToDatasetError – Raised if applied to a ND dataset (with N>1)
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Performing a linear regression without intercept to your 1D dataset is quite simple:
- kind: singleanalysis type: LinearRegressionWithFixedIntercept result: slope
Sometimes, you may have the situation that the (fixed) intercept is not zero, hence your data are “offset” by a scalar value. To account for that, provide a value for this offset, here 3.14:
- kind: singleanalysis type: LinearRegressionWithFixedIntercept properties: parameters: offset: 3.14 result: slope
As you sometimes want to graphically display both, data and the resulting linear regression, you may use the
aspecd.model.Polynomial
model to do just that. However, for this to work, you would need to get the (fixed) intercept returned as first coefficient as well. Here you go:- kind: singleanalysis type: LinearRegressionWithFixedIntercept properties: parameters: polynomial_coefficients: True result: regression_coefficients
The full story may look something like that, with “experimental_data” referring to the actual dataset to be analysed:
- kind: singleanalysis type: LinearRegressionWithFixedIntercept properties: parameters: polynomial_coefficients: True result: coefficients apply_to: - experimental_data - kind: model type: Polynomial properties: parameters: coefficients: coefficients from_dataset: experimental_data result: linear_regression_without_intercept - kind: multiplot type: MultiPlotter1D properties: filename: linear_regression_without_intercept.pdf apply_to: - experimental_data - linear_regression_without_intercept
With this, you should have your plot with data and linear regression together saved in the file
linear_regression_without_intercept.pdf
.Added in version 0.3.
- static applicable(dataset)
Check whether analysis step is applicable to the given dataset.
Polynomial fits can (currently) only be applied to 1D datasets.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – Dataset to check- Returns:
applicable – Whether dataset is applicable
- Return type:
- class aspecd.analysis.DeviceDataExtraction
Bases:
SingleAnalysisStep
Extract device data as separate dataset.
Datasets may contain additional data as device data in
aspecd.dataset.Dataset.device_data
. For details, seeaspecd.dataset.DeviceData
. To further process and analyse these device data, the most general way is to extract them as individual dataset and perform all further tasks on it.A reference to the original dataset is stored in
aspecd.dataset.Dataset.references
.- result
Dataset containing the device data.
The device the data are extracted for is provided by the parameter
device
, see below.
- parameters
All parameters necessary for this step.
- device
str
Name of the device the data should be extracted for.
Raises a
KeyError
if the device does not exist.Default: ‘’
- Type:
- device
- Raises:
KeyError – Raised if device is not present in
aspecd.dataset.Dataset.device_data
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Suppose you have a dataset that contains device data referenced with the key “timestamp”, and you want to extract those device data and make them accessible from within the recipe using the name “timestamp” as well:
- kind: singleanalysis type: DeviceDataExtraction properties: parameters: device: timestamp result: timestamp
Added in version 0.9.
- static applicable(dataset)
Check whether analysis step is applicable to the given dataset.
Device data extraction is only possible if device data are present.
- Parameters:
dataset (
aspecd.dataset.Dataset
) – Dataset to check- Returns:
applicable – Whether dataset is applicable
- Return type:
- class aspecd.analysis.CentreOfMass
Bases:
SingleAnalysisStep
Calculate centre of mass for ND datasets.
In Physics, the centre of mass of a body is the mass-weighted average of the positions of its mass points. It can be equally applied to an ND dataset, where the mass is related to the intensity value at a given point.
In one dimension, the centre of mass, \(x_s\), can be calculated by:
\[x_s = \frac{1}{M} \cdot \sum_{i=1}^{n} x_{i} \cdot m_{i}\]with the total mass \(M\), i.e. the sum of all point masses:
\[M = \sum_{i=1}^{n} m_{i}\]This can be generalised to arbitrary dimensions, defining the centre of mass as the mass-weighted average of the position vectors \(\vec{r}_i\):
\[\vec{r}_s = \frac{1}{M} \sum_{i}m_{i} \cdot \vec{r}_i\]Note that in contrast to
scipy.ndimage.center_of_mass()
, the actual axis values are used to calculate the centre of mass. Hence, the calculation works for non-uniform spacing of individual axes as well.- result
Coordinates of the centre of mass of the data in axis coordinates.
- Type:
np.array
Examples
For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see
aspecd.tasks
) is given below for how to make use of this class. The examples focus each on a single aspect.Obtaining the centre of mass of a given dataset is fairly straight-forward:
- kind: singleanalysis type: CentreOfMass result: centre_of_mass
However, usually you would like to graphically display the result in some way. Assuming a 1D dataset, you may plot a vertical line, using
aspecd.annotation.VerticalLine
, and using the result of the analysis as the x coordinate of the annotation:- kind: singleanalysis type: CentreOfMass result: centre_of_mass - kind: singleplot type: SinglePlotter1D properties: filename: plot.pdf result: plot - kind: plotannotation type: VerticalLine properties: parameters: positions: centre_of_mass plotter: plot
Added in version 0.11.