aspecd.processing module

Data processing functionality.

Key to reproducible science is automatic documentation of each processing step applied to the data of a dataset. Such a processing step each is self-contained, meaning it contains every necessary information to perform the processing task on a given dataset.

Processing steps, in contrast to analysis steps (see aspecd.analysis for details), not only operate on data of a aspecd.dataset.Dataset, but change its data. The information necessary to reproduce each processing step gets added to the aspecd.dataset.Dataset.history attribute of a dataset.

Each real processing step should inherit from aspecd.processing.ProcessingStep as documented there. Furthermore, each processing step should be contained in one module named “processing”. This allows for easy automation and replay of processing steps, particularly in context of recipe-driven data analysis (for details, see the aspecd.tasks module).

exception aspecd.processing.Error

Bases: Exception

Base class for exceptions in this module.

exception aspecd.processing.ProcessingNotApplicableToDatasetError(message='')

Bases: aspecd.processing.Error

Exception raised when processing step is not applicable to dataset

message

explanation of the error

Type

str

exception aspecd.processing.MissingDatasetError(message='')

Bases: aspecd.processing.Error

Exception raised when no dataset exists to act on

message

explanation of the error

Type

str

exception aspecd.processing.MissingProcessingStepError(message='')

Bases: aspecd.processing.Error

Exception raised when no processing step exists to act on

message

explanation of the error

Type

str

class aspecd.processing.ProcessingStep

Bases: object

Base class for processing steps.

Each class actually performing a processing step should inherit from this class. Furthermore, all parameters, implicit and explicit, necessary to perform the processing step, should eventually be stored in the property “self.parameters” (currently a dictionary).

To perform the processing step, call the process() method of the dataset the processing should be applied to, and provide a reference to the actual processing_step object to it.

Further things that need to be changed upon inheriting from this class are the string stored in description, being basically a one-liner, and the flag undoable if necessary.

The actual implementation of the processing step is done in the private method _perform_task() that in turn gets called by process() which is called by the aspecd.dataset.Dataset.process() method of the dataset object.

undoable

Can this processing step be reverted?

Type

bool

name

Name of the analysis step.

Defaults to the lower-case class name, don’t change!

Type

str

parameters

Parameters required for performing the processing step

All parameters, implicit and explicit.

Type

dict

info

Additional information used, e.g., in a report (derived values, …)

Type

dict

description

Short description, to be set in class definition

Type

str

comment

User-supplied comment describing intent, purpose, reason, …

Type

str

dataset

Dataset the analysis step should be performed on

Type

aspecd.dataset.Dataset

Raises
process(dataset=None, from_dataset=False)

Perform the actual processing step on the given dataset.

If no dataset is provided at method call, but is set as property in the ProcessingStep object, the aspecd.dataset.Dataset.process() method of the dataset will be called and thus the history written.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The aspecd.dataset.Dataset object always call this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the aspecd.processing.ProcessingStep object is not necessary.

The actual processing step should be implemented within the non-public method _perform_task(). Besides that, the applicability of the processing step to the given dataset will be checked automatically and the parameters will be sanitised by calling the non-public method _sanitise_parameters().

Parameters
  • dataset (aspecd.dataset.Dataset) – dataset to apply processing step to

  • from_dataset (boolean) –

    whether we are called from within a dataset

    Defaults to “False” and shall never be set manually.

Returns

dataset – dataset the processing step has been applied to

Return type

aspecd.dataset.Dataset

Raises
create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.process() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns

history_record – history record for processing step

Return type

aspecd.processing.ProcessingHistoryRecord

static applicable(dataset)

Check whether processing step is applicable to the given dataset.

Returns True by default and needs to be implemented in classes inheriting from ProcessingStep according to their needs.

Returns

applicableTrue if successful, False otherwise.

Return type

bool

class aspecd.processing.ProcessingStepRecord(processing_step=None)

Bases: object

Base class for processing step records stored in the dataset history.

The history of a aspecd.dataset.Dataset should not contain references to aspecd.processing.ProcessingStep objects, but rather records that contain all necessary information to create the respective objects inherited from aspecd.processing.ProcessingStep. One reason for this is simply that we want to import datasets containing processing steps in their history for which no corresponding processing class exists in the current installation of the application.

Note

Each history entry in a dataset stores the processing as a aspecd.processing.ProcessingStepRecord, even in applications inheriting from the ASpecD framework. Hence, subclassing of this class should normally not be necessary.

undoable

Can this processing step be reverted?

Type

bool

description

Short description, to be set in class definition

Type

str

parameters

Parameters required for performing the processing step

All parameters, implicit and explicit.

Type

dict

comment

User-supplied comment describing intent, purpose, reason, …

Type

str

class_name

Fully qualified name of the class of the corresponding processing step

Type

str

Parameters

processing_step (aspecd.processing.ProcessingStep) – Processing step the record should be created for.

Raises

aspecd.processing.MissingProcessingStepError – Raised when no processing step exists to act on

create_processing_step()

Create a processing step object from the parameters stored.

Returns

processing_step – actual processing step object that can be used for processing, e.g., in context of undo/redo

Return type

aspecd.processing.ProcessingStep

class aspecd.processing.ProcessingHistoryRecord(processing_step=None, package='')

Bases: aspecd.dataset.HistoryRecord

History record for processing steps on datasets.

processing

record of the processing step

Type

aspecd.processing.ProcessingStepRecord

Parameters
undoable

Can this processing step be reverted?

replay(dataset)

Replay the processing step saved in the history record.

Parameters

dataset (aspecd.dataset.Dataset) – dataset the processing step should be replayed to