You're reading the documentation for a development version. For the latest released version, please have a look at v0.12.

aspecd.analysis module

Data analysis functionality.

Key to reproducible science is automatic documentation of each analysis step applied to the data of a dataset. Such an analysis step each is self-contained, meaning it contains every necessary information to perform the analysis task on a given dataset.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained that is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset.

Generally, three types of analysis steps can be distinguished:

Analysis steps for handling single datasets

Shall be derived from aspecd.analysis.SingleAnalysisStep.
Analysis steps for handling multiple datasets

Shall be derived from aspecd.analysis.MultiAnalysisStep.
Analysis steps aggregating the results of a SingleAnalysisStep for multiple datasets

Handled by class aspecd.analysis.AggregatedAnalysisStep.

In the first case, the analysis is usually handled using the analyse() method of the respective aspecd.dataset.Dataset object. Additionally, those analysis steps always only operate on the data of a single dataset. Analysis steps handling single datasets should always inherit from the aspecd.analysis.SingleAnalysisStep class.

In the second case, the analysis step is handled using the analyse() method of the aspecd.analysis.AnalysisStep object, and the datasets are stored as a list within the analysis step. As these analysis steps span several datasets. Analysis steps handling multiple datasets should always inherit from the aspecd.analysis.MultiAnalysisStep class.

Performing a SingleAnalysisStep on multiple datasets and aggregating the results in a aspecd.dataset.CalculatedDataset is the realm of the third type of analysis steps, aspecd.analysis.AggregatedAnalysisStep.

The module contains both, base classes for analysis steps (as detailed above) as well as a series of generally applicable analysis steps for all kinds of spectroscopic data. The latter are an attempt to relieve the developers of packages derived from the ASpecD framework from the task to reinvent the wheel over and over again.

The next section gives an overview of the concrete analysis steps implemented within the ASpecD framework. For details of how to implement your own analysis steps, see the section below.

Concrete analysis steps

Besides providing the basis for analysis steps for the ASpecD framework, ensuring full reproducibility and traceability, hence reproducible science and good scientific practice, this module comes with a (growing) number of general-purpose analysis steps useful for basically all kinds of spectroscopic data.

Here is a list as a first overview. For details, see the detailed documentation of each of the classes, readily accessible by the link.

Analysis steps operating on individual datasets

The following analysis steps operate each on individual datasets independently.

BasicCharacteristics

Extract basic characteristics of a dataset
BasicStatistics

Extract basic statistical measures of a dataset
BlindSNREstimation

Blind, i.e. parameter-free, estimation of the signal-to-noise ratio
PeakFinding

Find peaks in 1D datasets
PowerDensitySpectrum

Calculate power density spectrum of 1D dataset, useful, e.g. for analysing the statistics of noise (i.e., its colour)
PolynomialFit

Perform polynomial fit on 1D data
LinearRegressionWithFixedIntercept

Perform linear regression without fitting the intercept on 1D data. Note that this is mathematically different from a polynomial fit of first order.
DeviceDataExtraction

Extract device data as separate dataset.

Datasets may contain additional data as device data in aspecd.dataset.Dataset.device_data. For details, see aspecd.dataset.DeviceData. To further process and analyse these device data, the most general way is to extract them as individual dataset and perform all further tasks on it.
CentreOfMass

Calculate centre of mass for ND datasets.

Writing own analysis steps

Each real analysis step should inherit from either aspecd.analysis.SingleAnalysisStep in case of operating on a single dataset only or from aspecd.analysis.MultiAnalysisStep in case of operating on several datasets at once. Furthermore, all analysis steps should be contained in one module named “analysis”. This allows for easy automation and replay of analysis steps, particularly in context of recipe-driven data analysis (for details, see the aspecd.tasks module).

General advice

A few hints on writing own analysis step classes:

Always inherit from aspecd.analysis.SingleAnalysisStep or aspecd.analysis.MultiAnalysisStep, depending on your needs.
Store all parameters, implicit and explicit, in the dict parameters of the aspecd.analysis.AnalysisStep class, not in separate properties of the class. Only this way, you can ensure full reproducibility and compatibility of recipe-driven data analysis (for details of the latter, see the aspecd.tasks module). Additionally, this way, if you return a (calculated) dataset, these parameters get automatically added to the metadata of the calculated dataset.
Always set the description property to a sensible value. Be as concise as possible. The first line of the class docstring may be a good inspiration.
Implement the actual analysis in the _perform_task method of the analysis step. For sanitising parameters and checking general applicability of the analysis step to the dataset(s) at hand, continue reading.
Make sure to implement the aspecd.analysis.AnalysisStep.applicable() method according to your needs. Typical cases would be to check for the dimensionality of the underlying data, as some analysis steps may work only for 1D data (or vice versa). Don’t forget to declare this as a static method, using the @staticmethod decorator.
With the _sanitise_parameters method, the input parameters are automatically checked and an appropriate exception can be thrown in order to describe the error source to the user.

Some more special cases are detailed below. For further advice, consult the source code of this module, and have a look at the concrete processing steps whose purpose is described below in more detail.

Adding parameters upon analysis

Sometimes there is the need to persist values that are only obtained during analysis of the data. These parameters should end up in the aspecd.analysis.AnalysisStep.parameters dictionary. Thus, they are added to the dataset history and available for reports and alike.

Changing the length of your data

When changing the length of the data, always change the corresponding axes values first, and only afterwards the data, as changing the data will change the axes values and adjust their length to the length of the corresponding dimension of the data.

Returning calculated datasets as result

The type of the attribute aspecd.analysis.AnalysisStep.result depends strongly on the specific analysis step. Sometimes, a calculated dataset will be returned. A typical example is aspecd.analysis.PeakFinding, where you can explicitly ask for a calculated dataset to be returned and use this result later for plotting both, original data and detected peaks overlaid. To have the minimal metadata of the calculated dataset set correctly, use the method aspecd.analysis.AnalysisStep.create_dataset() to obtain the calculated dataset. This will set both, type of calculation (to the full class name of the analysis step) and parameters. Of course, you are solely responsible to set the data and axes values (and further metadata, if applicable).

Module documentation

class aspecd.analysis.AnalysisStep

Bases: ToDictMixin

Base class for analysis steps.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset and can be found in the aspecd.analysis.AnalysisStep.result attribute.

In case aspecd.analysis.AnalysisStep.result is a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset), and the idea behind storing the result in form of a dataset is to be able to plot and further process these results in a fully generic manner. To create such a calculated dataset, use the method create_dataset() that will automatically set minimal metadata for you.

The actual implementation of the analysis step is done in the private method _perform_task() that in turn gets called by analyse() which is called by the aspecd.dataset.Dataset.analyse() method of the dataset object.

Note

Usually, you will never implement an instance of this class for actual analysis tasks, but rather one of the child classes, namely aspecd.analysis.SingleAnalysisStep and aspecd.analysis.MultiAnalysisStep, depending on whether your analysis step operates on a single dataset or requires multiple datasets.

name

Name of the analysis step.

Defaults to the lower-case class name, don’t change!

Type:: str

parameters

Parameters required for performing the analysis step

All parameters, implicit and explicit.

Type:: dict

result

Results of the analysis step

Can be either a aspecd.dataset.Dataset or some other class, e.g., aspecd.metadata.PhysicalQuantity.

In case of a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset)

index

Label for each element in result

Should only be set if result is a scalar or list.

The index will be used, e.g., by AggregatedAnalysisStep and in tabular representations of the results.

New in version 0.5.

Type:: list

dataset_type

Full class name of the dataset that should be created

In case of returning a calculated dataset, packages derived from the ASpecD framework may want to return their own instances of aspecd.dataset.CalculatedDataset.

Note that due to assigning some metadata, the class specified here needs to conform to aspecd.dataset.CalculatedDataset.

Default: “aspecd.dataset.CalculatedDataset”

New in version 0.7.

Type:: str

description

Short description, to be set in class definition

Type:: str

comment

User-supplied comment describing intent, purpose, reason, …

Type:: str

references

List of references with relevance for the implementation of the processing step.

Use appropriate record types from the bibrecord package.

New in version 0.4.

Type:: list

Raises:: aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

analyse()

Perform the actual analysis step on the given dataset.

The actual analysis step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the analysis step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method _sanitise_parameters().

analyze()

Perform the actual analysis step on the given dataset.

Same method as self.analyse, but for those preferring AE over BE

create_dataset()

Create calculated dataset containing minimal metadata.

The following metadata are set:

Metadata	Value
calculation.type	`name`
calculation.parameters	`parameters`

Returns:: dataset – (Calculated) dataset containing minimal metadata.
Return type:: aspecd.dataset.CalculatedDataset

New in version 0.2.

static applicable(dataset)

Check whether analysis step is applicable to the given dataset.

Returns True by default and needs to be implemented in classes inheriting from SingleAnalysisStep according to their needs.

This is a static method that gets called automatically by each class inheriting from aspecd.analysis.AnalysisStep. Hence, if you need to override it in your own class, make the method static as well. An example of an implementation testing for two-dimensional data is given below:

@staticmethod
def applicable(dataset):
    return len(dataset.data.axes) == 3

Parameters:: dataset (aspecd.dataset.Dataset) – dataset to check
Returns:: applicable – True if successful, False otherwise.
Return type:: bool

class aspecd.analysis.SingleAnalysisStep

Bases: AnalysisStep

Base class for analysis steps operating on single datasets.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset and can be found in the aspecd.analysis.SingleAnalysisStep.result attribute.

In case aspecd.analysis.SingleAnalysisStep.result is a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset), and the idea behind storing the result in form of a dataset is to be able to plot and further process these results in a fully generic manner.

The actual implementation of the analysis step is done in the private method _perform_task() that in turn gets called by analyse() which is called by the aspecd.dataset.Dataset.analyse() method of the dataset object.

preprocessing

List of necessary preprocessing steps to perform the analysis.

Type:: list

description

Short description, to be set in class definition

Type:: str

dataset

Dataset the analysis step should be performed on

Type:: aspecd.dataset.Dataset

Raises:: aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

analyse(dataset=None, from_dataset=False)

Perform the actual analysis step on the given dataset.

If no dataset is provided at method call, but is set as property in the SingleAnalysisStep object, the analyse method of the dataset will be called and thus the analysis added to the list of analyses of the dataset.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The aspecd.dataset.Dataset object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the aspecd.analysis.SingleAnalysisStep object is not necessary.

The actual analysis step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the analysis step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method _sanitise_parameters().

Additionally, each dataset will be automatically checked for applicability, using the aspecd.analysis.AnalysisStep.applicable() method. Make sure to override this method according to your needs.

Parameters:

dataset (aspecd.dataset.Dataset) – dataset to perform analysis for
from_dataset (boolean) –
whether we are called from within a dataset

Defaults to “False” and shall never be set manually.

Returns:

dataset – dataset analysis has been performed for

Return type:

aspecd.dataset.Dataset

Raises:

aspecd.exceptions.NotApplicableToDatasetError – Raised when analysis step is not applicable to dataset
aspecd.exceptions.MissingDatasetError – Raised when no dataset exists to act on

analyze(dataset=None, from_dataset=False)

Perform the actual analysis step on the given dataset.

Same method as self.analyse, but for those preferring AE over BE

add_preprocessing_step(processingstep=None)

Add a preprocessing step to the internal list.

Some analyses need some preprocessing of the data. These preprocessing steps are contained in the preprocessing attribute.

Parameters:: processingstep (aspecd.processing.ProcessingStep) – processing step to be added to the list of preprocessing steps

create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.analyse() method of the aspecd.dataset.Dataset class and ensures the history of each analysis step to get written properly.

Returns:: history_record – history record for analysis step
Return type:: aspecd.history.AnalysisHistoryRecord

class aspecd.analysis.MultiAnalysisStep

Bases: AnalysisStep

Base class for analysis steps operating on multiple datasets.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset and can be found in the aspecd.analysis.MultiAnalysisStep.result attribute.

The actual implementation of the analysis step is done in the private method _perform_task() that in turn gets called by analyse().

datasets

List of dataset the analysis step should be performed for

Type:: list

analyse()

Perform the actual analysis on the given list of datasets.

If no dataset is added to the list of datasets of the object, the method will raise a respective exception.

The actual analysis step should be implemented within the non-public method _perform_task(). Besides that, the parameters will be sanitised by calling the non-public method _sanitise_parameters().

Additionally, each dataset will be automatically checked for applicability, using the aspecd.analysis.AnalysisStep.applicable() method. Make sure to override this method according to your needs.

Raises:

aspecd.exceptions.MissingDatasetError – Raised when no datasets exist to act on
aspecd.exceptions.NotApplicableToDatasetError – Raised when analysis step is not applicable to dataset

class aspecd.analysis.AggregatedAnalysisStep

Bases: AnalysisStep

Perform a SingleAnalysisStep on multiple datasets and aggregate results.

Data analysis often involves performing one and the same analysis step on a series of datasets and aggregate the results in a single (calculated) dataset for further display, be it graphically or tabularly.

datasets

List of dataset the analysis step should be performed for

Type:: list

analysis_step

Name of the analysis step to perform on the datasets

Should be a class name of an analysis step. Can be either a full class name including package and module, or only the class name. In the latter case, it will be looked up in ‘aspecd.analysis’.

Type:: str

result

Result of the aggregated analysis

Type:: aspecd.dataset.CalculatedDataset

Raises:

aspecd.exceptions.MissingDatasetError – Raised if no datasets are given
aspecd.exceptions.MissingAnalysisStepError – Raised if no analysis_step is given
ValueError – Raised if the actual AnalysisStep returns a dataset

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Let’s assume that you want to extract the minima of a series of datasets:

- kind: aggregatedanalysis
  type: BasicCharacteristics
  properties:
    parameters:
      kind: min
  apply_to:
    - dataset1
    - dataset2
  result: basic_characteristics

If the analysis step is from another package, the full class name needs to be provided:

- kind: aggregatedanalysis
  type: package.module.AnalysisClass
  properties:
    parameters:
      kind: foo
  apply_to:
    - dataset1
    - dataset2
  result: my_analysis

New in version 0.5.

analyse()

Perform the given analysis step on the list of datasets.

The name of the analysis step to be performed on the list of datasets is provided in analysis_step. The analysis will result in a aspecd.dataset.CalculatedDataset with the metadata regarding the calculation (type and parameters) set accordingly to analysis_step and parameters.

class aspecd.analysis.BasicCharacteristics

Bases: SingleAnalysisStep

Extract basic characteristics of a dataset.

Extracting basic characteristics (minimum, maximum, area, amplitude) of a dataset is programmatically quite simple. This class provides a working solution from within the ASpecD framework.

parameters

All parameters necessary for this step.

kindstr

Kind of the characteristic to extract from the data

Valid values are “min”, “max”, “amplitude”, and “area”.

A special kind is “all”, returning all characteristics. In this case, output can only be “value” (the default).

outputstr

Kind of output: (intensity) value, axes value(s), or axes indices

Valid values are “value” (default), “axes”, and “indices”. For amplitude and area, as these characteristics have no analogon on the axes, only “value” is a valid output option.

Default: “value”

Type:: dict

result

Characteristic(s) of the dataset.

The actual return type depends on the type of characteristics and output selected.

kind (characteristic)	output	return type
min, max, amplitude, area	value	`float`
min, max	axes, indices	`list`
all	value	`list`

The corresponding kind is set as index, hence it will be used by AggregatedAnalysisStep and included in the dataset output by this step - and hence in tabular output created by aspecd.table.Table.

Type:: float | list

index

Label for each element in result

Always reflecting the kind of characteristic(s) asked for. In case of asking for values, it is a list of the kinds. In case of the output set to axes or indices, it will be the kind followed by either the axis quantity or “index#”.

Assuming a 2D dataset with axes quantities set to “wavelength” and “time” and asking for the minimum, the index will be ['min(wavelength)', 'min(time)'] in case of output set to axes, and ['min(index0)', 'min(index1)'] in case of output set to indices.

The index will be used, e.g., by AggregatedAnalysisStep and in tabular representations of the results.

Type:: list

Raises:: ValueError – Raised if no kind of characteristics is provided. Raised if kind of characteristics is unknown. Raised if output type is unknown. Raised if output type is not available for kind of characteristics.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Extracting the characteristic of a dataset is quite simple:

- kind: singleanalysis
  type: BasicCharacteristics
  properties:
    parameters:
      kind: min
  result: min_of_dataset

This would simply return the minimum (value) of a given dataset in the result assigned to the recipe-internal variable min_of_dataset. Similarly, you can extract “max”, “area”, and “amplitude” from your dataset. In case you are interested in the axes values or indices, set the output parameter appropriately:

- kind: singleanalysis
  type: BasicCharacteristics
  properties:
    parameters:
      kind: min
      output: axes
  result: min_of_dataset

In this particular case, this would return the axes values of the global minimum of your dataset in the result. Note that those other output types are only available for “min” and “max”, as “area” and “amplitude” have no analogon on the axes.

Sometimes, you are interested in getting the values of all characteristics at once in form of a list, with the kind stored in index:

- kind: singleanalysis
  type: BasicCharacteristics
  properties:
    parameters:
      kind: all
  result: characteristics_of_dataset

Make sure to understand the different types the result has depending on the characteristic and output type chosen. For details, see the table above.

New in version 0.2.

Changed in version 0.5: result is either scalar or list in all cases, and index is set to kind, for use with AggregatedAnalysisStep and tabular output.

class aspecd.analysis.BasicStatistics

Bases: SingleAnalysisStep

Extract basic statistical measures of a dataset.

Extracting basic statistical measures (mean, median, std, var) of a dataset is programmatically quite simple. This class provides a working solution from within the ASpecD framework.

parameters

All parameters necessary for this step.

kindstr

Kind of the statistical measure to extract from the data

Valid values are “mean”, “median”, “std”, and “var”.

Type:: dict

Raises:: ValueError – Raised if no kind of statistical measure is provided. Raised if kind of statistical measure is unknown.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Extracting the statistical measure of a dataset is quite simple:

- kind: singleanalysis
  type: BasicStatistics
  properties:
    parameters:
      type: median
  result: median_of_dataset

This would simply return the median of the data of a given dataset in the result assigned to the recipe-internal variable median_of_dataset. Similarly, you can extract “mean”, “std” (standard deviation), and “var” (variance) from your dataset.

New in version 0.2.

class aspecd.analysis.BlindSNREstimation

Bases: SingleAnalysisStep

Blind, i.e. parameter-free, estimation of the signal-to-noise ratio.

In spectroscopy, the signal-to-noise ratio (SNR) is usually defined as the ratio of mean (of the signal) to standard deviation (of the noise) of a signal or measurement.

For accurate estimations, this requires to be able to separate noise and signal, hence to define a part of the overall measurement not including signal. As this is not always possible, there are different ways to make a blind estimate of the SNR, i.e. without additional parameters.

The simplest possible approach of a blind estimate is the ratio of mean to standard deviation of the whole signal (method simple):

\[\mbox{SNR} = \frac{\mu}{\sigma}\]

An alternative version sometimes used is to take the suqare of both, mean and standard deviation (method simple_squared):

\[\mbox{SNR} = \frac{\mu^2}{\sigma^2}\]

This is equivalent to the more common definition using the ratio of the (average) power of signal and noise.

Yet another algorithm, the “DER_SNR” algorithm proposed by Stoehr et al. for use in astronomic data (for details see Czesla et al., 2018, details below) makes use of the median and a numeric second derivative (method der_snr):

\[ \begin{align}\begin{aligned}\mbox{SNR} &= \mbox{med} / \sigma\\\sigma &=\frac{1.482602}{\sqrt{6}}\mbox{med}_i(|-x_{i-2}+2x_i-x_{ i+2}|)\end{aligned}\end{align} \]

Other options would be to fit a polynomial to the data, subtract the fitted polynomial and estimate the noise this way. A Savitzky-Golay filter could be used for this.

An article dealing with SNR estimation for (astronomic) spectral data that provides a lot of details is:

S. Czesla, T. Molle, and J. H. M. M. Schmitt: A posteriori noise estimation in variable data sets. With applications to spectra and light curves. Astronomy and Astrophysics 609(2018):A39. https://doi.org/10.1051/0004-6361/201730618

For more information, the following resources may as well be useful:

Important

While all methods currently implemented are “parameter-free”, the estimates are based on a number of assumptions, the most important being normally distributed noise. Furthermore, your data need to be sampled appropriately, with the highest frequency component of your signal being well resolved.

parameters

All parameters necessary for this step.

methodstr

Method used to blindly estimate the SNR

Valid values are “simple”, “simple_squared”, “der_snr”.

Default: “simple”

Type:: dict

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Obtaining a blind estimate of the SNR of a dataset is quite simple:

- kind: singleanalysis
  type: BlindSNREstimation
  result: SNR_of_dataset

This would simply return the SNR of the data of a given dataset in the result assigned to the recipe-internal variable SNR_of_dataset.

To have more control over the method used to blindly estimate the SNR, explicitly provide a method:

- kind: singleanalysis
  type: BlindSNREstimation
  properties:
    parameters:
      method: der_snr
  result: SNR_of_dataset

This would use the DER_SNR method as described above.

New in version 0.2.

Changed in version 0.3: Added methods: “simple_squared”, “der_snr”

class aspecd.analysis.PeakFinding

Bases: SingleAnalysisStep

Peak finding in one dimension.

Finding peaks is a use case often encountered in analysing spectroscopic data, but it is far from trivial and usually requires careful choosing of parameters to yield sensible results.

The peak finding relies on the scipy.signal.find_peaks() function, hence you can set most of the parameters this function understands. For details of the parameters, see as well the SciPy documentation.

Important

Peak finding can only be applied to 1D datasets, due to the underlying algorithm.

parameters

All parameters necessary for this step.

negative_peaksbool

Whether to include negative peaks in peak finding as well.

Negative peaks are searched for by inverting the sign of the signal, and the list of peak positions is sorted.

Default: False

return_propertiesbool

Whether to return properties together with the peak positions.

If properties shall be returned as well, the attribute result will be a tuple containing the list of peak positions as first element and a dictionary with peak properties as second element.

Note: If negative peaks shall be returned as well, this option will be silently ignored and only the peak positions returned.

Default: False

return_datasetbool

Whether to return a calculated dataset as result.

In this case, the result will be an object of class aspecd.dataset.CalculatedDataset, with the data containing the peak intensities and the corresponding axis values the peak positions. Thus, this can be used to plot the peaks on top of the original data.

Default: False

return_intensitiesbool

Whether to return both, peak positions and intensities.

In this case, the result will be a 2D numpy array with the peak positions in the first and the peak intensities in the second column. This can be used, e.g., in annotations to mark the peak positions.

Default: False

New in version 0.11.

heightnumber or ndarray or sequence

Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.

Default: None

thresholdnumber or ndarray or sequence

Required threshold of peaks, the vertical distance to its neighboring samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required threshold.

Default: None

distancenumber

Required minimal horizontal distance (>= 1) in samples between neighbouring peaks. Smaller peaks are removed first until the condition is fulfilled for all remaining peaks.

Default: None

prominencenumber or ndarray or sequence

Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence.

Default: None

widthnumber or ndarray or sequence

Required width of peaks in samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required width.

Default: None

Type:: dict

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Finding the peak positions of a basically noise-free dataset is quite simple:

- kind: singleanalysis
  type: PeakFinding
  result: peaks

This would simply return the peak positions of the data of a given dataset in the result assigned to the recipe-internal variable peaks.

To have more control over the method used to find peaks, you can set a number of parameters. To get the negative peaks as well (normally, only positive peaks will be looked for):

- kind: singleanalysis
  type: PeakFinding
  properties:
    parameters:
      negative_peaks: True
  result: peaks

Sometimes it is convenient to have the peaks returned as a dataset, to plot the data and highlight the peaks found:

- kind: singleanalysis
  type: PeakFinding
  properties:
    parameters:
      return_dataset: True
  result: peaks

From the options that can be set for the function scipy.signal.find_peaks(), you can set “height”, “threshold”, “distance”, “prominence”, and “width”. For details, see the SciPy documentation.

For noisy data, “prominence” can be a good option to only find “true” peaks:

- kind: singleanalysis
  type: PeakFinding
  properties:
    parameters:
      prominence: 0.2
  result: peaks

If you supply one of these additional options, you might be interested not only in the peak positions, but in the properties of the peaks found as well.

- kind: singleanalysis
  type: PeakFinding
  properties:
    parameters:
      prominence: 0.2
      return_properties: True
  result: peaks

In this case, the result, here stored in the variable “peaks”, will be a tuple with the peak positions as first element and a dictionary with properties as the second element. Note that if you ask for negative peaks as well, this option will silently be ignored and only the peak positions returned.

New in version 0.2.

static applicable(dataset)

Check whether analysis step is applicable to the given dataset.

Peak finding can only be applied to 1D datasets.

Parameters:: dataset (aspecd.dataset.Dataset) – Dataset to check
Returns:: applicable – Whether dataset is applicable
Return type:: bool

class aspecd.analysis.PowerDensitySpectrum

Bases: SingleAnalysisStep

Calculate power density spectrum of given 1D dataset.

The power density spectrum is the log10 of the power for each frequency component as function of the log10 of the frequency. For mathematical reasons, the power of the DC component (f = 0) is omitted.

The power density spectrum (sometimes called power spectral density, PSD) can be used to analyse the nature of noise, e.g. whether it is Gaussian (white, normally distributed) noise or “coloured” noise with the frequencies of the noise components differently weighted.

In spectroscopy, often coloured noise (most frequently pink or 1/f noise) rather than white noise is encountered. The characteristics of white noise is an equal distribution of all frequencies, related to a constant in the power density spectrum. Pink or 1/f noise, in contrast, exhibits a linear damping of higher frequencies with a slope of -1 in the power density spectrum. For more details regarding noise in spectroscopy, the interested reader is referred to the documentation of the aspecd.processing.Noise class.

result

power density spectrum of the corresponding 1D dataset analysed

Type:: aspecd.dataset.CalculatedDataset

parameters

All parameters necessary for this step.

methodstr

Method to use to calculate the power density spectrum

Possible methods must exist in the scipy.signal module. Currently, you can choose between “periodogram” and “welch”. See their respective documentation, i.e. scipy.signal.periodogram() and scipy.signal.welch() for details.

Default: periodogram

Type:: dict

Raises:: aspecd.exceptions.NotApplicableToDatasetError – Raised if applied to a ND dataset (with N>1)

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Computing the power density spectrum of a 1D dataset (let’s assume you use a trace of pure noise for this) is quite simple:

- kind: singleanalysis
  type: PowerDensitySpectrum
  result: power_density_spectrum

This would simply return the power density spectrum of the data of a given dataset in the result assigned to the recipe-internal variable power_density_spectrum. Note that the result is itself a calculated dataset, hence you can easily plot it for graphical representation and manual inspection:

- kind: singleplot
  type: SinglePlotter1D
  properties:
    filename: power_density_spectrum.pdf
  apply_to: power_density_spectrum

You may even want to calculate the power density spectrum, perform a polynomial fit (of first order), evaluate the polynomial for the coefficients obtained by the fit, and plot both together in one figure:

- kind: singleanalysis
  type: PowerDensitySpectrum
  result: power_density_spectrum

- kind: singleanalysis
  type: PolynomialFit
  result: coefficients
  apply_to: power_density_spectrum

- kind: model
  type: Polynomial
  properties:
    parameters:
      coefficients: coefficients
  from_dataset: power_density_spectrum
  result: linear_fit

- kind: multiplot
  type: MultiPlotter1D
  properties:
    filename: power_density_spectrum.pdf
  apply_to:
    - power_density_spectrum
    - linear_fit

To have more control over the method used to calculate the power density spectrum, you can explicitly provide a method name here:

- kind: singleanalysis
  type: PowerDensitySpectrum
  properties:
    parameters:
      method: welch
  result: power_density_spectrum

Note that the methods need to reside in the scipy.signal module.

New in version 0.3.

static applicable(dataset)

Check whether analysis step is applicable to the given dataset.

Power density spectrum calculation can only be applied to 1D datasets.

Parameters:: dataset (aspecd.dataset.Dataset) – Dataset to check
Returns:: applicable – Whether dataset is applicable
Return type:: bool

class aspecd.analysis.PolynomialFit

Bases: SingleAnalysisStep

Perform polynomial fit on 1D data.

The coefficients obtained can be used to evaluate a model that can be plotted together with the data the polynomial has been fitted to originally. At the same time, you can tabulate the coefficients.

result

coefficients of the fitted polynomial in increasing order

As the new numpy.polynomial package is used, particularly the numpy.polynomial.polynomial.Polynomial class, the coefficients are given in increasing order, with the first element corresponding to x**0. Furthermore, the coefficients are given in the unscaled data domain (using the numpy.polynomial.polynomial.Polynomial.convert() method).

Type:: list

parameters

All parameters necessary for this step.

orderint

Order (degree) of the polynomial to be fitted to the data

Default: 1

Type:: dict

Raises:: aspecd.exceptions.NotApplicableToDatasetError – Raised if applied to a ND dataset (with N>1)

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Fitting a polynomial of first order to your 1D dataset is quite simple:

- kind: singleanalysis
  type: PolynomialFit
  result: polynomial_coefficients_1st_order

This would simply return the polynomial coefficients (in increasing order) as a list in the result assigned to the recipe-internal variable polynomial_coefficients_1st_order.

If you would like to fit a polynomial of different order, simply provide the desired order as an additional parameter:

- kind: singleanalysis
  type: PolynomialFit
  properties:
    parameters:
      order: 3
  result: polynomial_coefficients_3rd_order

In this case, the result will contain a list of four coefficients of the fitted polynomial of third order, again in increasing order.

New in version 0.3.

static applicable(dataset)

Check whether analysis step is applicable to the given dataset.

Polynomial fits can (currently) only be applied to 1D datasets.

Parameters:: dataset (aspecd.dataset.Dataset) – Dataset to check
Returns:: applicable – Whether dataset is applicable
Return type:: bool

class aspecd.analysis.LinearRegressionWithFixedIntercept

Bases: SingleAnalysisStep

Perform linear regression with fixed intercept on 1D data

In contrast to a regular polynomial fit of first order, where two parameters (slope and intercept) are fitted, there are mathematical models where the intercept is fixed, i.e. not to be fitted as well. In these cases, using a polynomial fit of first order is simply wrong. Of course, which of these approaches is valid depends on the underlying model and the physical reality to be modelled.

Note

A prime example of a linear model with only a slope and no intercept is Hooke’s law stating that the force F needed to extend or compress a spring by some distance x scales linearly with respect to that distance, i.e., F = kx. Here, k is the characteristic of the spring, sometimes called “spring constant”.

The approach taken here is to use linear algebra and solve the system of equations by calling numpy.linalg.lstsq(). In case of a vertical offset (i.e., intercept not zero), the offset is first subtracted from the function values and afterwards the regression performed.

result

slope of the linear regression

If you set the parameter polynomial_coefficients to True, a list with (fixed) intercept and (fitted) slope will be returned (see below).

Type:: float

parameters

All parameters necessary for this step.

offsetfloat

Vertical offset of the data, i.e. f(0)

Useful in cases where the model defines an intercept f(0) != 0.

Default: 0

polynomial_coefficientsbool

Whether to return both, intercept and slope for compatibility with polynomial model, aspecd.model.Polynomial

Default: False

Type:: dict

Raises:: aspecd.exceptions.NotApplicableToDatasetError – Raised if applied to a ND dataset (with N>1)

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Performing a linear regression without intercept to your 1D dataset is quite simple:

- kind: singleanalysis
  type: LinearRegressionWithFixedIntercept
  result: slope

Sometimes, you may have the situation that the (fixed) intercept is not zero, hence your data are “offset” by a scalar value. To account for that, provide a value for this offset, here 3.14:

- kind: singleanalysis
  type: LinearRegressionWithFixedIntercept
  properties:
    parameters:
      offset: 3.14
  result: slope

As you sometimes want to graphically display both, data and the resulting linear regression, you may use the aspecd.model.Polynomial model to do just that. However, for this to work, you would need to get the (fixed) intercept returned as first coefficient as well. Here you go:

- kind: singleanalysis
  type: LinearRegressionWithFixedIntercept
  properties:
    parameters:
      polynomial_coefficients: True
  result: regression_coefficients

The full story may look something like that, with “experimental_data” referring to the actual dataset to be analysed:

- kind: singleanalysis
  type: LinearRegressionWithFixedIntercept
  properties:
    parameters:
      polynomial_coefficients: True
  result: coefficients
  apply_to:
    - experimental_data

- kind: model
  type: Polynomial
  properties:
    parameters:
      coefficients: coefficients
  from_dataset: experimental_data
  result: linear_regression_without_intercept

- kind: multiplot
  type: MultiPlotter1D
  properties:
    filename: linear_regression_without_intercept.pdf
  apply_to:
    - experimental_data
    - linear_regression_without_intercept

With this, you should have your plot with data and linear regression together saved in the file linear_regression_without_intercept.pdf.

New in version 0.3.

static applicable(dataset)

Check whether analysis step is applicable to the given dataset.

Polynomial fits can (currently) only be applied to 1D datasets.

Parameters:: dataset (aspecd.dataset.Dataset) – Dataset to check
Returns:: applicable – Whether dataset is applicable
Return type:: bool

class aspecd.analysis.DeviceDataExtraction

Bases: SingleAnalysisStep

Extract device data as separate dataset.

Datasets may contain additional data as device data in aspecd.dataset.Dataset.device_data. For details, see aspecd.dataset.DeviceData. To further process and analyse these device data, the most general way is to extract them as individual dataset and perform all further tasks on it.

A reference to the original dataset is stored in aspecd.dataset.Dataset.references.

result

Dataset containing the device data.

The device the data are extracted for is provided by the parameter device, see below.

Type:: aspecd.dataset.CalculatedDataset

parameters

All parameters necessary for this step.

devicestr

Name of the device the data should be extracted for.

Raises a KeyError if the device does not exist.

Default: ‘’

Type:: dict

Raises:: KeyError – Raised if device is not present in aspecd.dataset.Dataset.device_data

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Suppose you have a dataset that contains device data referenced with the key “timestamp”, and you want to extract those device data and make them accessible from within the recipe using the name “timestamp” as well:

- kind: singleanalysis
  type: DeviceDataExtraction
  properties:
    parameters:
      device: timestamp
  result: timestamp

New in version 0.9.

static applicable(dataset)

Check whether analysis step is applicable to the given dataset.

Device data extraction is only possible if device data are present.

Parameters:: dataset (aspecd.dataset.Dataset) – Dataset to check
Returns:: applicable – Whether dataset is applicable
Return type:: bool

class aspecd.analysis.CentreOfMass

Bases: SingleAnalysisStep

Calculate centre of mass for ND datasets.

In Physics, the centre of mass of a body is the mass-weighted average of the positions of its mass points. It can be equally applied to an ND dataset, where the mass is related to the intensity value at a given point.

In one dimension, the centre of mass, \(x_s\), can be calculated by:

\[x_s = \frac{1}{M} \cdot \sum_{i=1}^{n} x_{i} \cdot m_{i}\]

with the total mass \(M\), i.e. the sum of all point masses:

\[M = \sum_{i=1}^{n} m_{i}\]

This can be generalised to arbitrary dimensions, defining the centre of mass as the mass-weighted average of the position vectors \(\vec{r}_i\):

\[\vec{r}_s = \frac{1}{M} \sum_{i}m_{i} \cdot \vec{r}_i\]

Note that in contrast to scipy.ndimage.center_of_mass(), the actual axis values are used to calculate the centre of mass. Hence, the calculation works for non-uniform spacing of individual axes as well.

result

Coordinates of the centre of mass of the data in axis coordinates.

Type:: np.array

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Obtaining the centre of mass of a given dataset is fairly straight-forward:

- kind: singleanalysis
  type: CentreOfMass
  result: centre_of_mass

However, usually you would like to graphically display the result in some way. Assuming a 1D dataset, you may plot a vertical line, using aspecd.annotation.VerticalLine, and using the result of the analysis as the x coordinate of the annotation:

- kind: singleanalysis
  type: CentreOfMass
  result: centre_of_mass

- kind: singleplot
  type: SinglePlotter1D
  properties:
    filename: plot.pdf
  result: plot

- kind: plotannotation
  type: VerticalLine
  properties:
    parameters:
      positions: centre_of_mass
  plotter: plot

New in version 0.11.