aspecd.analysis module

Data analysis functionality.

Key to reproducible science is automatic documentation of each analysis step applied to the data of a dataset. Such an analysis step each is self-contained, meaning it contains every necessary information to perform the analysis task on a given dataset.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained that is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset.

Each real analysis step should inherit from aspecd.analysis.SingleAnalysisStep as documented there. Furthermore, each analysis step should be contained in one module named “analysis”. This allows for easy automation and replay of analysis steps, particularly in context of recipe-driven data analysis (for details, see the aspecd.tasks module).

Todo

Add capabilities of handling analysis steps spanning multiple datasets, in a similar fashion to what has been done for plots (see the plotting module for details). In contrast to processing steps, analysis steps can span multiple datasets. Prominent examples would be comparing intensities of different datasets or global fits of multiple datasets.

exception aspecd.analysis.Error

Bases: Exception

Base class for exceptions in this module.

exception aspecd.analysis.MissingDatasetError(message='')

Bases: aspecd.analysis.Error

Exception raised when no dataset exists to act on

message

explanation of the error

Type

str

exception aspecd.analysis.MissingAnalysisStepError(message='')

Bases: aspecd.analysis.Error

Exception raised when no analysis step exists to act on

message

explanation of the error

Type

str

class aspecd.analysis.AnalysisStep

Bases: object

Base class for analysis steps.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset and can be found in the aspecd.analysis.SingleAnalysisStep.result attribute.

In case aspecd.analysis.SingleAnalysisStep.result is a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset), and the idea behind storing the result in form of a dataset is to be able to plot and further process these results in a fully generic manner.

name

Name of the analysis step.

Defaults to the lower-case class name, don’t change!

Type

str

parameters

Parameters required for performing the analysis step

All parameters, implicit and explicit.

Type

dict

result

Results of the analysis step

Can be either a aspecd.dataset.Dataset or some other class, e.g., aspecd.metadata.PhysicalQuantity.

In case of a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset)

description

Short description, to be set in class definition

Type

str

comment

User-supplied comment describing intent, purpose, reason, …

Type

str

Raises

aspecd.analysis.MissingDatasetError – Raised when no dataset exists to act on

analyse()

Perform the actual analysis step on the given dataset.

The actual analysis step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the analysis step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method _sanitise_parameters().

analyze()

Perform the actual analysis step on the given dataset.

Same method as self.analyse, but for those preferring AE over BE

class aspecd.analysis.SingleAnalysisStep

Bases: aspecd.analysis.AnalysisStep

Base class for analysis steps operating on single datasets.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset and can be found in the aspecd.analysis.SingleAnalysisStep.result attribute.

In case aspecd.analysis.SingleAnalysisStep.result is a dataset, it is a calculated dataset (aspecd.dataset.CalculatedDataset), and the idea behind storing the result in form of a dataset is to be able to plot and further process these results in a fully generic manner.

preprocessing

List of necessary preprocessing steps to perform the analysis.

Type

list

description

Short description, to be set in class definition

Type

str

dataset

Dataset the analysis step should be performed on

Type

aspecd.dataset.Dataset

Raises

aspecd.analysis.MissingDatasetError – Raised when no dataset exists to act on

analyse(dataset=None, from_dataset=False)

Perform the actual analysis step on the given dataset.

If no dataset is provided at method call, but is set as property in the SingleAnalysisStep object, the process method of the dataset will be called and thus the history written.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The aspecd.dataset.Dataset object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the aspecd.analysis.SingleAnalysisStep object is not necessary.

The actual analysis step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the analysis step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method _sanitise_parameters().

Parameters
  • dataset (aspecd.dataset.Dataset) – dataset to perform analysis for

  • from_dataset (boolean) –

    whether we are called from within a dataset

    Defaults to “False” and shall never be set manually.

Returns

dataset – dataset analysis has been performed for

Return type

aspecd.dataset.Dataset

analyze(dataset=None)

Perform the actual analysis step on the given dataset.

Same method as self.analyse, but for those preferring AE over BE

add_preprocessing_step(processingstep=None)

Add a preprocessing step to the internal list.

Some analyses need some preprocessing of the data. These preprocessing steps are contained in the preprocessing attribute.

Parameters

processingstep (aspecd.processing.ProcessingStep) – processing step to be added to the list of preprocessing steps

create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.analyse() method of the aspecd.dataset.Dataset class and ensures the history of each analysis step to get written properly.

Returns

history_record – history record for analysis step

Return type

aspecd.analysis.AnalysisHistoryRecord

class aspecd.analysis.MultiAnalysisStep

Bases: aspecd.analysis.AnalysisStep

Base class for analysis steps operating on multiple datasets.

Analysis steps, in contrast to processing steps (see aspecd.processing for details), operate on data of a aspecd.dataset.Dataset, but don’t change its data. Rather, some result is obtained. This result is stored separately, together with the parameters of the analysis step, in the aspecd.dataset.Dataset.analyses attribute of the dataset and can be found in the aspecd.analysis.MultiAnalysisStep.result attribute.

datasets

List of dataset the analysis step should be performed for

Type

list

analyse()

Perform the actual analysis on the given list of datasets.

If no dataset is added to the list of datasets of the object, the method will raise a respective exception.

The actual analysis step should be implemented within the non-public method _perform_task(). Besides that, the parameters will be sanitised by calling the non-public method _sanitise_parameters().

Raises

aspecd.analysis.MissingDatasetError – Raised when no datasets exist to act on

class aspecd.analysis.AnalysisStepRecord(analysis_step=None)

Bases: object

Base class for analysis step records.

The analysis of a aspecd.dataset.Dataset should not contain references to aspecd.analysis.AnalysisStep objects, but rather records that contain all necessary information to create the respective objects inherited from aspecd.analysis.AnalysisStep. One reason for this is simply that we want to import datasets containing analysis steps in their analyses for which no corresponding analysis class exists in the current installation of the application. Another is to not have an infinite recursion of datasets, as the dataset is stored in an aspecd.analysis.AnalysisStep object.

description

Short description, to be set in class definition

Type

str

parameters

Parameters required for performing the analysis step

All parameters, implicit and explicit.

Type

dict

comment

User-supplied comment describing intent, purpose, reason, …

Type

str

class_name

Fully qualified name of the class of the corresponding analysis step

Type

str

Parameters

analysis_step (aspecd.analysis.SingleAnalysisStep) – Analysis step the record should be created for.

Raises

aspecd.analysis.MissingAnalysisStepError – Raised when no analysis step exists to act on

create_analysis_step()

Create an analysis step object from the parameters stored.

Returns

analysis_step – actual analysis step object that can be used for analysis

Return type

aspecd.analysis.SingleAnalysisStep

class aspecd.analysis.SingleAnalysisStepRecord(analysis_step=None)

Bases: aspecd.analysis.AnalysisStepRecord

Base class for analysis step records stored in the dataset analyses.

The analysis of a aspecd.dataset.Dataset should not contain references to aspecd.analysis.AnalysisStep objects, but rather records that contain all necessary information to create the respective objects inherited from aspecd.analysis.AnalysisStep. One reason for this is simply that we want to import datasets containing analysis steps in their analyses for which no corresponding analysis class exists in the current installation of the application. Another is to not have an infinite recursion of datasets, as the dataset is stored in an aspecd.analysis.AnalysisStep object.

Note

Each analyses entry in a dataset stores the analysis step as a aspecd.analysis.SingleAnalysisStepRecord, even in applications inheriting from the ASpecD framework. Hence, subclassing of this class should normally not be necessary.

preprocessing

List of processing steps

The actual processing steps are objects of the class aspecd.processing.ProcessingStepRecord.

Type

list

Parameters

analysis_step (aspecd.analysis.SingleAnalysisStep) – Analysis step the record should be created for.

class aspecd.analysis.AnalysisHistoryRecord(analysis_step=None, package='')

Bases: aspecd.dataset.HistoryRecord

History record for analysis steps on datasets.

analysis

Analysis step the history is saved for

Type

aspecd.analysis.SingleAnalysisStep

package

Name of package the history record gets recorded for

Prerequisite for reproducibility, gets stored in the aspecd.dataset.HistoryRecord.sysinfo attribute. Will usually be provided automatically by the dataset.

Type

str

Parameters
replay(dataset)

Replay the analysis step saved in the history record.

Parameters

dataset (aspecd.dataset.Dataset) – dataset the analysis step should be replayed to