You're reading the documentation for a development version. For the latest released version, please have a look at v0.12.

aspecd.io module

Input and output (IO) of information from and to the persistence layer.

Currently, input and output of both, datasets and recipes can be handled.

Datasets

Both, data and metadata contained in datasets as well as the information stored in recipes for recipe-driven data analysis can be read and written.

For datasets, two generic classes are provided:

As the name says, these classes should be used to implement import and export functionality for your own purposes in applications derived from the ASpecD framework.

Generally, both import and export should be handled via the respective methods of the aspecd.dataset.Dataset class, thus first instantiating an object of that class and an appropriate importer or exporter, and afterwards only operating on the dataset using its methods.

In its most generic form, this may look something like:

dataset = aspecd.dataset.Dataset()
importer = aspecd.io.DatasetImporter(source="/path/to/your/data")
dataset.import_from(importer)

Similarly, you would handle the export of your data (and metadata) contained in a dataset object using an exporter object, respectively.

dataset = aspecd.dataset.Dataset()
importer = aspecd.io.DatasetExporter(target="/path/to/destination")
dataset.export_to(exporter)

However, if you use recipe-driven data analysis, things become much simpler:

  • Imports will be automatically taken care of.

  • Exports can be specified as simple task.

A simple example of a recipe only loading datasets and afterwards exporting them could look like this:

datasets:
  - /path/to/first/dataset
  - /path/to/second/dataset

tasks:
  - kind: export
    type: AdfExporter
    properties:
      target:
        - dataset1
        - dataset2

What is happening here? Two datasets are imported, and afterwards exported to the ASpecD Dataset Format (ADF) using the aspecd.io.AdfExporter.

Another frequent use case, although one that admittedly pretty much opposes the whole idea of the ASpecD framework in terms of reproducibility and traceability: Your collaboration partners require you to provide them with raw data they can import into their favourite program for creating plots. The only viable way: export to plain text (ouch!) - saying good-bye to all your metadata and history:

datasets:
  - /path/to/first/cool/dataset
  - /path/to/second/cool/dataset
  - /path/to/another/cool/dataset

tasks:
  - kind: export
    type: TxtExporter
    properties:
      target:
        - cool-dataset1
        - cool-dataset2
        - cool-dataset3

In this case, you can as well add whatever processing necessary to your datasets before exporting them, and you see that recipes come in quite handy here.

More control over imports

Sometimes there is the need to have more control over the import, be it that you would want to set labels for datasets explicitly upon load, determine which importer to use, or provide additional parameters for an importer (a frequent use case for the rather generic TxtImporter).

This is an excerpt of an example recipe importing ASCII exports of a common UV/Vis spectrometer and showing many of the options possible:

datasets:
  - source: cbztbt.txt
    label: D-A
    id: Cbz-TBT
    importer: TxtImporter
    importer_parameters:
      skiprows: 2
      separator: ','

So what’s happening here? Lets go through step by step:

  • Datasets are a list, as usual, but this time, it is not a list of filenames, but a list of (hierarchical) key–value pairs.

  • The source key sets the filename (and can include a path, as usual).

  • The label key sets the label used for the dataset, e.g., in a figure legend.

  • The id key sets the (unique) identifier (ID) the dataset can be referred to throughout the recipe. This is often useful if you want to restrict certain tasks to only a subset of the loaded datasets.

  • The importer key sets the importer class to use. This class needs to be available from within your current package. You can prefix the class name with a package if you like.

  • The importer_parameters key is a series of key–value pairs (i.e., a dict in Python language) setting additional parameters for the specific importer. See the documentation of the respective importer class for further details.

Of course, you need not use all of these parameters. Usually, if you want to specify importer parameters, it is a good idea to be explicit about the importer as well. However, even that is not strictly necessary. The only thing that is strictly necessary: As soon as you want to provide more than a filename/path per dataset, you need to switch from a list of strings ( i.e., filenames/paths) to a key–value approach, with source being the key for the filename/path.

Importers for specific file formats

There exists a series of importers for specific file formats:

For details, see the respective class documentation.

Exporters for specific file formats

Datasets need to be persisted sometimes, and currently, there exist two exporters for specific file formats that can be imported again using the respective importers. Furthermore, the full information contained in a dataset will be retained.

For details, see the respective class documentation.

A bit a special case is the exporter to plain text files, as this file format does not preserve the metadata stored within the dataset and should only be used as last resort:

Warning

All metadata contained within a dataset (including the full history) are lost when exporting to plain text. Therefore, using this exporter will usually result in you loosing reproducibility. Hence, better think twice before using this exporter and use entirely on your own risk and only if you really know what you are doing (and why).

Writing importers for data

When writing importer classes for your own data, there is a number of pitfalls, some of which shall be described here together with solutions and “best practices”.

Dimensions of data

Usually, we assign axes in the order x, y, z, and assume the x axis to be the horizontal axis in a plot. However, numpy (as well as other software), follows a different convention, with the first index referring to the row of your matrix, the second index to the column. That boils down to having the first index correspond to the y axis, and the second index referring to the x axis.

As long as your data are one-dimensional, resulting in two axes objects in your dataset, everything is fine, and the second axis will have no values.

However, if your data to be imported are two-dimensional, your first dimension will be the index of rows (along a column), hence the y axis, and the second dimension the index of your columns (along a row), i.e. the x axis. This is perfectly fine, and it is equally fine to revert this order, as long as you ensure your axis objects to be consistent with the dimensions of your data.

If you assign numeric data to the aspecd.dataset.Data.data property, the corresponding axes values will initially be set to the indices of the data points along the corresponding dimension, with the first axis (index 0) corresponding to the first dimension (row indices along a column) and similar for each of the following dimensions of your data. Note that there will always be one axis more than dimensions of your data. This last axis will not have values, and usually its quantity is something like “intensity”.

Backup of the data

One essential concept of the ASpecD dataset is to store the original data together with their axes in a separate, non-public property. This is done automatically by the importer after calling out to its non-public method aspecd.io.DatasetImporter._import(). Hence, usually you need not take care of this at all.

Handling of metadata

Data without information about these data are usually pretty useless. Hence, an ASpecD dataset is always a unit of numerical data and corresponding metadata. While you will need to come up with your own structure for metadata of your datasets and create a hierarchy of classes derived from aspecd.metadata.DatasetMetadata, your importers need to ensure that these metadata are populated respectively. Of course, which metadata can be populated depends strongly on the file format you are about to import.

Handling different file formats for importing data

Often, data are available in different formats, and deciding which importer is appropriate for a given format can be quite involved. To free other classes from having to contain the relevant code, a factory can be used:

Currently, the sole information provided to decide about the appropriate importer is the source (a string). A concrete importer object is returned by the method get_importer(). Thus, using the factory in another class may look like the following:

importer_factory = aspecd.io.DatasetImporterFactory()
importer = importer_factory.get_importer(source="/path/to/your/data")
dataset = aspecd.dataset.Dataset()
dataset.import_from(importer)

Here, as in the example above, “source” refers to a (unique) identifier of your dataset, be it a filename, path, URL/URI, LOI, or alike.

Important

For recipe-driven data analysis to work with an ASpecD-derived package, you need to implement a aspecd.io.DatasetImporterFactory class there as well that can be obtained by instantiating <your_package>.io.DatasetImporterFactory().

Recipes

For recipes, a similar set of classes is provided:

For additional concrete classes handling import and export from and to YAML files see below.

The same general principles laid out above for the datasets applies to these classes as well. In particular, both import and export should be handled via the respective methods of the aspecd.tasks.Recipe class, thus first instantiating an object of that class and an appropriate importer or exporter, and afterwards only operating on the recipe using its methods.

In its most generic form, this may look something like:

recipe = aspecd.tasks.Recipe()
importer = aspecd.io.RecipeImporter(source="/path/to/your/recipe")
recipe.import_from(importer)

Similarly, you would handle the export of the information contained in a recipe object using an exporter object, respectively.

To simplify the input and output of recipes, and due recipe-driven data analysis being an intrinsic property of the ASpecD framework, two classes handling the import and export from and to YAML files are provided as well:

These classes can directly be used to work with YAML files containing information for recipe-driven data analysis. For details of the YAML file structure, see the aspecd.tasks.Recipe class and its attributes.

Module documentation

class aspecd.io.DatasetImporter(source=None)

Bases: object

Base class for dataset importer.

Each class actually importing data and metadata into a dataset should inherit from this class.

To perform the import, call the import_from() method of the dataset the import should be performed for, and provide a reference to the actual importer object to it.

The actual implementation of the importing is done in the private method _import() that in turn gets called by import_into() which is called by the aspecd.dataset.Dataset.import_from() method of the dataset object.

One question arising when actually implementing an importer for a specific file format: How do the data get into the dataset? The simple answer: The _import() method of the importer knows about the dataset and its structure (see aspecd.dataset.Dataset for details) and assigns data (and metadata) read from an external source to the respective fields of the dataset. In terms of a broader software architecture point of view: The dataset knows nothing about the importer besides its bare existence and interface, whereas the importer knows about the dataset and how to map data and metadata.

dataset

dataset to import data and metadata into

Type:

aspecd.dataset.Dataset

source

specifier of the source the data and metadata will be read from

Type:

str

parameters

Additional parameters to control import options.

Useful in case of, e.g., CSV importers where the user may want to set things such as the delimiter

New in version 0.2.

Type:

dict

Raises:

aspecd.io.MissingDatasetError – Raised when no dataset exists to act upon

import_into(dataset=None)

Perform the actual import into the given dataset.

If no dataset is provided at method call, but is set as property in the importer object, the aspecd.dataset.Dataset.import_from() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.

The actual import should be implemented within the non-public method _import().

Note

A number of parameters of the dataset are automatically assigned after calling out to the non-public method aspecd.io.DatasetImporter._import(), namely the non-public property _origdata of the dataset is populated with a copy of aspecd.dataset.Dataset.data, and id and label are set to aspecd.io.DatasetImporter.source.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to import data and metadata into

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

class aspecd.io.DatasetImporterFactory

Bases: object

Factory for creating importer objects based on the source provided.

Often, data are available in different formats, and deciding which importer is appropriate for a given format can be quite involved. To free other classes from having to contain the relevant code, a factory can be used.

Currently, the sole information provided to decide about the appropriate importer is the source (a string). A concrete importer object is returned by the method get_importer(). If no source is provided, an exception will be raised.

The actual code for deciding which type of importer to return in what case should be implemented in the non-public method _get_importer() in any package based on the ASpecD framework.

In its basic implementation, as done here, the non-public method _get_importer() returns the importers for ADF, ASDF, and TXT depending on the file extension, and in all other cases the standard importer.

This might be a viable way for an own DatasetImporterFactory implementation in the rare case of having only one single type of data, but provides a sensible starting point for own developments.

source

Source of the dataset to be loaded.

Gets set by calling the method get_importer() with the source parameter.

Type:

str

Raises:

aspecd.io.MissingSourceError – Raised if no source is provided

get_importer(source='', importer='', parameters=None)

Return importer object for dataset specified by its source.

The actual code for deciding which type of importer to return in what case should be implemented in the non-public method _get_importer() in any package based on the ASpecD framework.

If no importer gets returned by the method _get_importer(), the ASpecD-interal importers will be checked for matching the file type. Thus, you can overwrite the behaviour of any filetype supported natively by the ASpecD framework, but retain compatibility to the ASpecD-specific file types.

Note

Currently, only filenames/paths are supported, and if source does not start with the file separator, the absolute path to the current directory is prepended.

Parameters:
  • source (str) –

    string describing the source of the dataset

    May be a filename or path, a URL/URI, a LOI, or similar

  • importer (str) –

    Name of the importer to use for importing the dataset

    Default: ‘’

    New in version 0.2.

  • parameters (dict) –

    Additional parameters for controlling the import

    Default: None

    New in version 0.2.

Returns:

importer – importer object of appropriate class

Return type:

aspecd.io.DatasetImporter

Raises:

aspecd.io.MissingSourceError – Raised if no source is provided

class aspecd.io.DatasetExporter(target=None)

Bases: object

Base class for dataset exporter.

Each class actually exporting data from a dataset to some other should inherit from this class.

To perform the export, call the export_to() method of the dataset the export should be performed for, and provide a reference to the actual exporter object to it.

The actual implementation of the exporting is done in the non-public method _export() that in turn gets called by export_from() which is called by the aspecd.dataset.Dataset.export_to() method of the dataset object.

dataset

dataset to export data and metadata from

Type:

aspecd.dataset.Dataset

target

specifier of the target the data and metadata will be written to

Type:

string

comment

User-supplied comment describing intent, purpose, reason, …

Type:

str

Raises:

aspecd.io.MissingDatasetError – Raised when no dataset exists to act upon

Changed in version 0.6.4: New attribute comment

export_from(dataset=None)

Perform the actual export from the given dataset.

If no dataset is provided at method call, but is set as property in the exporter object, the aspecd.dataset.Dataset.export_to() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.

The actual export is implemented within the non-public method _export() that gets automatically called.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to export data and metadata from

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.export_to() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns:

history_record – history record for export step

Return type:

aspecd.history.DatasetExporterHistoryRecord

New in version 0.9.

class aspecd.io.RecipeImporter(source='')

Bases: object

Base class for recipe importer.

Each class actually importing recipes into a aspecd.tasks.Recipe object should inherit from this class.

To perform the import, call the import_from() method of the recipe the import should be performed for, and provide a reference to the actual importer object to it.

The actual implementation of the importing is done in the non-public method _import() that in turn gets called by import_into() which is called by the aspecd.tasks.Recipe.import_from() method of the recipe object.

One question arising when actually implementing an importer for a specific file format: How does the information get into the recipe? The simple answer: The _import() method of the importer knows about the recipe and its structure (see aspecd.tasks.Recipe for details) and creates a dictionary with keys corresponding to the respective attributes of the recipe. In turn, it can then call the aspecd.tasks.Recipe.from_dict() method. In terms of a broader software architecture point of view: The recipe knows nothing about the importer besides its bare existence and interface, whereas the importer knows about the recipe and how to map the information obtained to it.

recipe

recipe to import into

Type:

aspecd.tasks.Recipe

source

specifier of the source the information will be read from

Type:

str

Raises:

aspecd.io.MissingRecipeError – Raised when no dataset exists to act upon

import_into(recipe=None)

Perform the actual import into the given recipe.

If no recipe is provided at method call, but is set as property in the importer object, the aspecd.tasks.Recipe.import_from() method of the recipe will be called.

If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.

The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the importer object is not necessary.

The actual import should be implemented within the non-public method _import().

Parameters:

recipe (aspecd.tasks.Recipe) – recipe to import into

Raises:

aspecd.io.MissingRecipeError – Raised if no recipe is provided.

class aspecd.io.RecipeExporter(target='')

Bases: object

Base class for recipe exporter.

Each class actually exporting recipes from aspecd.tasks.Recipe objects should inherit from this class.

To perform the export, call the aspecd.tasks.Recipe.export_to() method of the recipe the export should be performed for, and provide a reference to the actual exporter object to it.

The actual implementation of the exporting is done in the non-public method _export() that in turn gets called by export_from() which is called by the aspecd.tasks.Recipe.export_to() method of the recipe object.

recipe

recipe to export information from

Type:

aspecd.tasks.Recipe

target

specifier of the target the information will be written to

Type:

string

Raises:

aspecd.io.MissingRecipeError – Raised when no dataset exists to act upon

export_from(recipe=None)

Perform the actual export from the given recipe.

If no recipe is provided at method call, but is set as property in the exporter object, the aspecd.tasks.Recipe.export_to() method of the recipe will be called.

If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.

The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the exporter object is not necessary.

The actual export should be implemented within the non-public method _export().

Parameters:

recipe (aspecd.tasks.Recipe) – Recipe to export from

Raises:

aspecd.io.MissingRecipeError – Raised if no recipe is provided.

class aspecd.io.RecipeYamlImporter(source='')

Bases: RecipeImporter

Recipe importer for importing from YAML files.

The YAML file needs to have a structure compatible to the actual recipe, such that the dict created from reading the YAML file can be directly fed into the aspecd.tasks.Recipe.from_dict() method.

The order of entries of the YAML file is preserved due to using ordered dictionaries (collections.OrderedDict) internally.

Parameters:

source (str) – filename of a YAML file to read from

import_into(recipe=None)

Perform the actual import into the given recipe.

If no recipe is provided at method call, but is set as property in the importer object, the aspecd.tasks.Recipe.import_from() method of the recipe will be called.

If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.

The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the importer object is not necessary.

The actual import should be implemented within the non-public method _import().

Parameters:

recipe (aspecd.tasks.Recipe) – recipe to import into

Raises:

aspecd.io.MissingRecipeError – Raised if no recipe is provided.

class aspecd.io.RecipeYamlExporter(target='')

Bases: RecipeExporter

Recipe exporter for exporting to YAML files.

The YAML file will have a structure corresponding to the output of the aspecd.tasks.Recipe.to_dict() method of the recipe object.

Parameters:

target (str) – filename of a YAML file to write to

export_from(recipe=None)

Perform the actual export from the given recipe.

If no recipe is provided at method call, but is set as property in the exporter object, the aspecd.tasks.Recipe.export_to() method of the recipe will be called.

If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.

The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the exporter object is not necessary.

The actual export should be implemented within the non-public method _export().

Parameters:

recipe (aspecd.tasks.Recipe) – Recipe to export from

Raises:

aspecd.io.MissingRecipeError – Raised if no recipe is provided.

class aspecd.io.AdfExporter(target=None)

Bases: DatasetExporter

Dataset exporter for exporting to ASpecD dataset format.

The ASpecD dataset format is vaguely reminiscent of the Open Document Format, i.e. a zipped directory containing structured data (in this case in form of a YAML file) and binary data in a corresponding subdirectory.

As PyYAML is not capable of dealing with NumPy arrays out of the box, those are dealt with separately. Small arrays are stored inline as lists, larger arrays in separate files. For details, see the aspecd.utils.Yaml class.

The data format tries to be as self-contained as possible, using standard file formats and a brief description of its layout contained within the archive. Collecting the contents in a single ZIP archive allows the user to deal with a single file for a dataset, while more advanced users can easily dig into the details and write importers for other platforms and programming languages, making the format rather platform-independent and future-safe. Due to using binary representation for larger numerical arrays, the format should be more memory-efficient than other formats.

create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.export_to() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns:

history_record – history record for export step

Return type:

aspecd.history.DatasetExporterHistoryRecord

New in version 0.9.

export_from(dataset=None)

Perform the actual export from the given dataset.

If no dataset is provided at method call, but is set as property in the exporter object, the aspecd.dataset.Dataset.export_to() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.

The actual export is implemented within the non-public method _export() that gets automatically called.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to export data and metadata from

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

class aspecd.io.AdfImporter(source=None)

Bases: DatasetImporter

Dataset importer for importing from ASpecD dataset format.

For more details of the ASpecD dataset format, see the aspecd.io.AdfExporter class.

import_into(dataset=None)

Perform the actual import into the given dataset.

If no dataset is provided at method call, but is set as property in the importer object, the aspecd.dataset.Dataset.import_from() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.

The actual import should be implemented within the non-public method _import().

Note

A number of parameters of the dataset are automatically assigned after calling out to the non-public method aspecd.io.DatasetImporter._import(), namely the non-public property _origdata of the dataset is populated with a copy of aspecd.dataset.Dataset.data, and id and label are set to aspecd.io.DatasetImporter.source.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to import data and metadata into

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

class aspecd.io.AsdfExporter(target=None)

Bases: DatasetExporter

Dataset exporter for exporting to Advanced Scientific Data Format (ASDF).

For more information on ASDF, see the homepage of the asdf package, and its format specification.

create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.export_to() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns:

history_record – history record for export step

Return type:

aspecd.history.DatasetExporterHistoryRecord

New in version 0.9.

export_from(dataset=None)

Perform the actual export from the given dataset.

If no dataset is provided at method call, but is set as property in the exporter object, the aspecd.dataset.Dataset.export_to() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.

The actual export is implemented within the non-public method _export() that gets automatically called.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to export data and metadata from

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

class aspecd.io.AsdfImporter(source=None)

Bases: DatasetImporter

Dataset importer for importing from Advanced Scientific Data Format (ASDF).

For more information on ASDF, see the homepage of the asdf package, and its format specification.

import_into(dataset=None)

Perform the actual import into the given dataset.

If no dataset is provided at method call, but is set as property in the importer object, the aspecd.dataset.Dataset.import_from() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.

The actual import should be implemented within the non-public method _import().

Note

A number of parameters of the dataset are automatically assigned after calling out to the non-public method aspecd.io.DatasetImporter._import(), namely the non-public property _origdata of the dataset is populated with a copy of aspecd.dataset.Dataset.data, and id and label are set to aspecd.io.DatasetImporter.source.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to import data and metadata into

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

class aspecd.io.TxtImporter(source=None)

Bases: DatasetImporter

Dataset importer for importing from plain text files (TXT).

Plain text files have often the disadvantage of no accompanying metadata, therefore the use of plain text files for data storage is highly discouraged, besides other problems like inherent low resolution/accuracy or otherwise large file sizes.

The main reason for this class to exist is that it provides a simple way to showcase ASpecD functionality reading from primitive data sources. Besides that, sometimes you will encounter plain text files.

Note

The importer relies on numpy.loadtxt() for reading text files. Hence, you can use any parameters understood by this function as keys in the parameters attribute. For handling decimal separators (non-standard behaviour), see below.

If your data consist of two columns, the first will automatically be interpreted as the x axis. In all other cases, data will be read as is and no axes values explicitly written.

parameters

Parameters controlling the import

skiprowsint

Number of rows to skip in text file (e.g., header lines)

delimiterstr

The string used to separate values.

Default: None (meaning: whitespace)

commentsstr | list

Characters or list of characters indicating the start of a comment.

Default: #

separatorstr

Character used as decimal separator.

Default: None (meaning: dot)

axisint | None

Column index to use for axis in case of 2D data.

Often, when reading 2D text data, the first column contains the axis values. In case you don’t have axis values in the data, set this parameter to None (in case of a YAML recipe, use Null instead).

You can provide any (valid) column index here, starting with 0 for the first column. The column marked as axis is removed from the data array, but set as axis values.

Default: 0

New in version 0.11.

Type:

dict

Decimal separators

Handling decimal separators other than the dot is notoriously difficult, though often necessary. For this, the non-standard key separator has been introduced that is not supported by numpy itself. If you specify a character using this parameter, the file will be read as text and the character specified replaced by the dot. Only afterwards will numpy.loadtxt() be used with all the other parameters as usual.

Changed in version 0.6.3: Document more parameter keys; add handling of decimal separator.

Changed in version 0.11: For 2D data, the first column (index 0) is used as axis values by default.

import_into(dataset=None)

Perform the actual import into the given dataset.

If no dataset is provided at method call, but is set as property in the importer object, the aspecd.dataset.Dataset.import_from() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.

The actual import should be implemented within the non-public method _import().

Note

A number of parameters of the dataset are automatically assigned after calling out to the non-public method aspecd.io.DatasetImporter._import(), namely the non-public property _origdata of the dataset is populated with a copy of aspecd.dataset.Dataset.data, and id and label are set to aspecd.io.DatasetImporter.source.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to import data and metadata into

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.

class aspecd.io.TxtExporter(target=None)

Bases: DatasetExporter

Dataset exporter for exporting to plain text files (TXT).

Plain text files have often the disadvantage of no accompanying metadata, therefore the use of plain text files for data storage is highly discouraged, besides other problems like inherent low resolution/accuracy or otherwise large file sizes.

Warning

All metadata contained within a dataset (including the full history) are lost when exporting to plain text. Therefore, using this exporter will usually result in you loosing reproducibility. Hence, better think twice before using this exporter and use entirely on your own risk and only if you really know what you are doing (and why).

The main reason for this class to exist is that sometimes there is a need to export data to a simple exchange format that can be shared with collaboration partners.

Note

The importer relies on numpy.savetxt() for writing text files. Hence, the same limitations apply, e.g. only working for 1D and 2D data, but not data with more than two dimensions.

In case of 1D data, the resulting file will consist of two columns, with the first column consisting of the axis values and the second column containing the actual data. An example of the contents of such a file are given below:

3.400000000000000000e+02 6.340967862812832978e-01
3.410000000000000000e+02 3.424209074593306257e-01
3.420000000000000000e+02 1.675116805484100357e-02

In case of 2D data, the resulting file will contain the axes values in the first row/column respectively. Hence, the size of the matrix will be +1 in both directions compared to the size of the actual data and the first element (top left) will always be zero (and shall be ignored). An example of the contents of such a file are given below:

0.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00
3.400000000000000000e+02 6.340967862812832978e-01 5.979077980106655144e-01
3.410000000000000000e+02 3.424209074593306257e-01 1.052868239245914328e-01
3.420000000000000000e+02 1.675116805484100357e-02 9.050894282755458375e-01

These two examples show immediately two of the problems of this file format: You are left to guess the quantity and unit of each the axes, and these files get quite big, as many decimal places are stored to not loose numerical resolution. With 50 characters per line for a 1D dataset (translating to at least one byte each), you end up with 50 kB for 1000 values.

create_history_record()

Create history record to be added to the dataset.

Usually, this method gets called from within the aspecd.dataset.export_to() method of the aspecd.dataset.Dataset class and ensures the history of each processing step to get written properly.

Returns:

history_record – history record for export step

Return type:

aspecd.history.DatasetExporterHistoryRecord

New in version 0.9.

export_from(dataset=None)

Perform the actual export from the given dataset.

If no dataset is provided at method call, but is set as property in the exporter object, the aspecd.dataset.Dataset.export_to() method of the dataset will be called.

If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.

The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.

The actual export is implemented within the non-public method _export() that gets automatically called.

Parameters:

dataset (aspecd.dataset.Dataset) – Dataset to export data and metadata from

Raises:

aspecd.io.MissingDatasetError – Raised if no dataset is provided.