You're reading the documentation for a development version. For the latest released version, please have a look at v0.10.
aspecd.io module¶
Input and output (IO) of information from and to the persistence layer.
Currently, input and output of both, datasets and recipes can be handled.
Datasets¶
Both, data and metadata contained in datasets as well as the information stored in recipes for recipe-driven data analysis can be read and written.
For datasets, two generic classes are provided:
As the name says, these classes should be used to implement import and export functionality for your own purposes in applications derived from the ASpecD framework.
Generally, both import and export should be handled via the respective
methods of the aspecd.dataset.Dataset
class, thus first
instantiating an object of that class and an appropriate importer or
exporter, and afterwards only operating on the dataset using its methods.
In its most generic form, this may look something like:
dataset = aspecd.dataset.Dataset()
importer = aspecd.io.DatasetImporter(source="/path/to/your/data")
dataset.import_from(importer)
Similarly, you would handle the export of your data (and metadata) contained in a dataset object using an exporter object, respectively.
dataset = aspecd.dataset.Dataset()
importer = aspecd.io.DatasetExporter(target="/path/to/destination")
dataset.export_to(exporter)
However, if you use recipe-driven data analysis, things become much simpler:
Imports will be automatically taken care of.
Exports can be specified as simple task.
A simple example of a recipe only loading datasets and afterwards exporting them could look like this:
datasets:
- /path/to/first/dataset
- /path/to/second/dataset
tasks:
- kind: export
type: AdfExporter
properties:
target:
- dataset1
- dataset2
What is happening here? Two datasets are imported, and afterwards exported
to the ASpecD Dataset Format (ADF) using the aspecd.io.AdfExporter
.
Another frequent use case, although one that admittedly pretty much opposes the whole idea of the ASpecD framework in terms of reproducibility and traceability: Your collaboration partners require you to provide them with raw data they can import into their favourite program for creating plots. The only viable way: export to plain text (ouch!) - saying good-bye to all your metadata and history:
datasets:
- /path/to/first/cool/dataset
- /path/to/second/cool/dataset
- /path/to/another/cool/dataset
tasks:
- kind: export
type: TxtExporter
properties:
target:
- cool-dataset1
- cool-dataset2
- cool-dataset3
In this case, you can as well add whatever processing necessary to your datasets before exporting them, and you see that recipes come in quite handy here.
More control over imports¶
Sometimes there is the need to have more control over the import,
be it that you would want to set labels for datasets explicitly upon load,
determine which importer to use, or provide additional parameters for an
importer (a frequent use case for the rather generic TxtImporter
).
This is an excerpt of an example recipe importing ASCII exports of a common UV/Vis spectrometer and showing many of the options possible:
datasets:
- source: cbztbt.txt
label: D-A
id: Cbz-TBT
importer: TxtImporter
importer_parameters:
skiprows: 2
separator: ','
So what’s happening here? Lets go through step by step:
Datasets are a list, as usual, but this time, it is not a list of filenames, but a list of (hierarchical) key–value pairs.
The
source
key sets the filename (and can include a path, as usual).The
label
key sets the label used for the dataset, e.g., in a figure legend.The
id
key sets the (unique) identifier (ID) the dataset can be referred to throughout the recipe. This is often useful if you want to restrict certain tasks to only a subset of the loaded datasets.The
importer
key sets the importer class to use. This class needs to be available from within your current package. You can prefix the class name with a package if you like.The
importer_parameters
key is a series of key–value pairs (i.e., adict
in Python language) setting additional parameters for the specific importer. See the documentation of the respective importer class for further details.
Of course, you need not use all of these parameters. Usually, if you want
to specify importer parameters, it is a good idea to be explicit about the
importer as well. However, even that is not strictly necessary. The only
thing that is strictly necessary: As soon as you want to provide more than
a filename/path per dataset, you need to switch from a list of strings (
i.e., filenames/paths) to a key–value approach, with source
being
the key for the filename/path.
Importers for specific file formats¶
There exists a series of importers for specific file formats:
-
Importer for data in ASpecD Dataset Format (ADF)
-
Importer for data in asdf format
-
Importer for data in plain text format
For details, see the respective class documentation.
Exporters for specific file formats¶
Datasets need to be persisted sometimes, and currently, there exist two exporters for specific file formats that can be imported again using the respective importers. Furthermore, the full information contained in a dataset will be retained.
-
Exporter for datasets to ASpecD Dataset Format (ADF)
-
Exporter for datasets to asdf format
For details, see the respective class documentation.
A bit a special case is the exporter to plain text files, as this file format does not preserve the metadata stored within the dataset and should only be used as last resort:
-
Exporter for data to plain text format
Warning
All metadata contained within a dataset (including the full history) are lost when exporting to plain text. Therefore, using this exporter will usually result in you loosing reproducibility. Hence, better think twice before using this exporter and use entirely on your own risk and only if you really know what you are doing (and why).
Writing importers for data¶
When writing importer classes for your own data, there is a number of pitfalls, some of which shall be described here together with solutions and “best practices”.
Dimensions of data¶
Usually, we assign axes in the order x, y, z, and assume the x axis to be the horizontal axis in a plot. However, numpy (as well as other software), follows a different convention, with the first index referring to the row of your matrix, the second index to the column. That boils down to having the first index correspond to the y axis, and the second index referring to the x axis.
As long as your data are one-dimensional, resulting in two axes objects in your dataset, everything is fine, and the second axis will have no values.
However, if your data to be imported are two-dimensional, your first dimension will be the index of rows (along a column), hence the y axis, and the second dimension the index of your columns (along a row), i.e. the x axis. This is perfectly fine, and it is equally fine to revert this order, as long as you ensure your axis objects to be consistent with the dimensions of your data.
If you assign numeric data to the aspecd.dataset.Data.data
property,
the corresponding axes values will initially be set to the indices of the
data points along the corresponding dimension, with the first axis (index 0)
corresponding to the first dimension (row indices along a column) and
similar for each of the following dimensions of your data. Note that there
will always be one axis more than dimensions of your data. This last axis
will not have values, and usually its quantity is something like “intensity”.
Backup of the data¶
One essential concept of the ASpecD dataset is to store the original data
together with their axes in a separate, non-public property. This is done
automatically by the importer after calling out to its non-public method
aspecd.io.DatasetImporter._import()
. Hence, usually you need not take
care of this at all.
Handling of metadata¶
Data without information about these data are usually pretty useless. Hence,
an ASpecD dataset is always a unit of numerical data and corresponding
metadata. While you will need to come up with your own structure for
metadata of your datasets and create a hierarchy of classes derived from
aspecd.metadata.DatasetMetadata
, your importers need to ensure that
these metadata are populated respectively. Of course, which metadata can be
populated depends strongly on the file format you are about to import.
Handling different file formats for importing data¶
Often, data are available in different formats, and deciding which importer is appropriate for a given format can be quite involved. To free other classes from having to contain the relevant code, a factory can be used:
Currently, the sole information provided to decide about the appropriate
importer is the source (a string). A concrete importer object is returned
by the method get_importer()
. Thus, using the factory in another
class may look like the following:
importer_factory = aspecd.io.DatasetImporterFactory()
importer = importer_factory.get_importer(source="/path/to/your/data")
dataset = aspecd.dataset.Dataset()
dataset.import_from(importer)
Here, as in the example above, “source” refers to a (unique) identifier of your dataset, be it a filename, path, URL/URI, LOI, or alike.
Important
For recipe-driven data analysis to work with an ASpecD-derived package,
you need to implement a aspecd.io.DatasetImporterFactory
class
there as well that can be obtained by instantiating
<your_package>.io.DatasetImporterFactory()
.
Recipes¶
For recipes, a similar set of classes is provided:
For additional concrete classes handling import and export from and to YAML files see below.
The same general principles laid out above for the datasets applies to
these classes as well. In particular, both import and export should be
handled via the respective methods of the aspecd.tasks.Recipe
class, thus first instantiating an object of that class and an appropriate
importer or exporter, and afterwards only operating on the recipe using
its methods.
In its most generic form, this may look something like:
recipe = aspecd.tasks.Recipe()
importer = aspecd.io.RecipeImporter(source="/path/to/your/recipe")
recipe.import_from(importer)
Similarly, you would handle the export of the information contained in a recipe object using an exporter object, respectively.
To simplify the input and output of recipes, and due recipe-driven data analysis being an intrinsic property of the ASpecD framework, two classes handling the import and export from and to YAML files are provided as well:
These classes can directly be used to work with YAML files containing
information for recipe-driven data analysis. For details of the YAML file
structure, see the aspecd.tasks.Recipe
class and its attributes.
Module documentation¶
- class aspecd.io.DatasetImporter(source=None)¶
Bases:
object
Base class for dataset importer.
Each class actually importing data and metadata into a dataset should inherit from this class.
To perform the import, call the
import_from()
method of the dataset the import should be performed for, and provide a reference to the actual importer object to it.The actual implementation of the importing is done in the private method
_import()
that in turn gets called byimport_into()
which is called by theaspecd.dataset.Dataset.import_from()
method of the dataset object.One question arising when actually implementing an importer for a specific file format: How do the data get into the dataset? The simple answer: The
_import()
method of the importer knows about the dataset and its structure (seeaspecd.dataset.Dataset
for details) and assigns data (and metadata) read from an external source to the respective fields of the dataset. In terms of a broader software architecture point of view: The dataset knows nothing about the importer besides its bare existence and interface, whereas the importer knows about the dataset and how to map data and metadata.- dataset¶
dataset to import data and metadata into
- parameters¶
Additional parameters to control import options.
Useful in case of, e.g., CSV importers where the user may want to set things such as the delimiter
New in version 0.2.
- Type
- Raises
aspecd.io.MissingDatasetError – Raised when no dataset exists to act upon
- import_into(dataset=None)¶
Perform the actual import into the given dataset.
If no dataset is provided at method call, but is set as property in the importer object, the
aspecd.dataset.Dataset.import_from()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.
The actual import should be implemented within the non-public method
_import()
.Note
A number of parameters of the dataset are automatically assigned after calling out to the non-public method
aspecd.io.DatasetImporter._import()
, namely the non-public property_origdata
of the dataset is populated with a copy ofaspecd.dataset.Dataset.data
, and id and label are set toaspecd.io.DatasetImporter.source
.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to import data and metadata into- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- class aspecd.io.DatasetImporterFactory¶
Bases:
object
Factory for creating importer objects based on the source provided.
Often, data are available in different formats, and deciding which importer is appropriate for a given format can be quite involved. To free other classes from having to contain the relevant code, a factory can be used.
Currently, the sole information provided to decide about the appropriate importer is the source (a string). A concrete importer object is returned by the method
get_importer()
. If no source is provided, an exception will be raised.The actual code for deciding which type of importer to return in what case should be implemented in the non-public method
_get_importer()
in any package based on the ASpecD framework.In its basic implementation, as done here, the non-public method
_get_importer()
returns the importers for ADF, ASDF, and TXT depending on the file extension, and in all other cases the standard importer.This might be a viable way for an own
DatasetImporterFactory
implementation in the rare case of having only one single type of data, but provides a sensible starting point for own developments.- source¶
Source of the dataset to be loaded.
Gets set by calling the method
get_importer()
with thesource
parameter.- Type
- Raises
aspecd.io.MissingSourceError – Raised if no source is provided
- get_importer(source='', importer='', parameters=None)¶
Return importer object for dataset specified by its source.
The actual code for deciding which type of importer to return in what case should be implemented in the non-public method
_get_importer()
in any package based on the ASpecD framework.If no importer gets returned by the method
_get_importer()
, the ASpecD-interal importers will be checked for matching the file type. Thus, you can overwrite the behaviour of any filetype supported natively by the ASpecD framework, but retain compatibility to the ASpecD-specific file types.Note
Currently, only filenames/paths are supported, and if
source
does not start with the file separator, the absolute path to the current directory is prepended.- Parameters
source (
str
) –string describing the source of the dataset
May be a filename or path, a URL/URI, a LOI, or similar
importer (
str
) –Name of the importer to use for importing the dataset
Default: ‘’
New in version 0.2.
parameters (
dict
) –Additional parameters for controlling the import
Default: None
New in version 0.2.
- Returns
importer – importer object of appropriate class
- Return type
- Raises
aspecd.io.MissingSourceError – Raised if no source is provided
- class aspecd.io.DatasetExporter(target=None)¶
Bases:
object
Base class for dataset exporter.
Each class actually exporting data from a dataset to some other should inherit from this class.
To perform the export, call the
export_to()
method of the dataset the export should be performed for, and provide a reference to the actual exporter object to it.The actual implementation of the exporting is done in the non-public method
_export()
that in turn gets called byexport_from()
which is called by theaspecd.dataset.Dataset.export_to()
method of the dataset object.- dataset¶
dataset to export data and metadata from
- target¶
specifier of the target the data and metadata will be written to
- Type
string
- Raises
aspecd.io.MissingDatasetError – Raised when no dataset exists to act upon
Changed in version 0.6.4: New attribute
comment
- export_from(dataset=None)¶
Perform the actual export from the given dataset.
If no dataset is provided at method call, but is set as property in the exporter object, the
aspecd.dataset.Dataset.export_to()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.
The actual export is implemented within the non-public method
_export()
that gets automatically called.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to export data and metadata from- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- create_history_record()¶
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.export_to()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly.- Returns
history_record – history record for export step
- Return type
New in version 0.9.
- class aspecd.io.RecipeImporter(source='')¶
Bases:
object
Base class for recipe importer.
Each class actually importing recipes into a
aspecd.tasks.Recipe
object should inherit from this class.To perform the import, call the
import_from()
method of the recipe the import should be performed for, and provide a reference to the actual importer object to it.The actual implementation of the importing is done in the non-public method
_import()
that in turn gets called byimport_into()
which is called by theaspecd.tasks.Recipe.import_from()
method of the recipe object.One question arising when actually implementing an importer for a specific file format: How does the information get into the recipe? The simple answer: The
_import()
method of the importer knows about the recipe and its structure (seeaspecd.tasks.Recipe
for details) and creates a dictionary with keys corresponding to the respective attributes of the recipe. In turn, it can then call theaspecd.tasks.Recipe.from_dict()
method. In terms of a broader software architecture point of view: The recipe knows nothing about the importer besides its bare existence and interface, whereas the importer knows about the recipe and how to map the information obtained to it.- recipe¶
recipe to import into
- Type
- Raises
aspecd.io.MissingRecipeError – Raised when no dataset exists to act upon
- import_into(recipe=None)¶
Perform the actual import into the given recipe.
If no recipe is provided at method call, but is set as property in the importer object, the
aspecd.tasks.Recipe.import_from()
method of the recipe will be called.If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.
The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the importer object is not necessary.
The actual import should be implemented within the non-public method
_import()
.- Parameters
recipe (
aspecd.tasks.Recipe
) – recipe to import into- Raises
aspecd.io.MissingRecipeError – Raised if no recipe is provided.
- class aspecd.io.RecipeExporter(target='')¶
Bases:
object
Base class for recipe exporter.
Each class actually exporting recipes from
aspecd.tasks.Recipe
objects should inherit from this class.To perform the export, call the
aspecd.tasks.Recipe.export_to()
method of the recipe the export should be performed for, and provide a reference to the actual exporter object to it.The actual implementation of the exporting is done in the non-public method
_export()
that in turn gets called byexport_from()
which is called by theaspecd.tasks.Recipe.export_to()
method of the recipe object.- recipe¶
recipe to export information from
- Type
- target¶
specifier of the target the information will be written to
- Type
string
- Raises
aspecd.io.MissingRecipeError – Raised when no dataset exists to act upon
- export_from(recipe=None)¶
Perform the actual export from the given recipe.
If no recipe is provided at method call, but is set as property in the exporter object, the
aspecd.tasks.Recipe.export_to()
method of the recipe will be called.If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.
The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the exporter object is not necessary.
The actual export should be implemented within the non-public method
_export()
.- Parameters
recipe (
aspecd.tasks.Recipe
) – Recipe to export from- Raises
aspecd.io.MissingRecipeError – Raised if no recipe is provided.
- class aspecd.io.RecipeYamlImporter(source='')¶
Bases:
aspecd.io.RecipeImporter
Recipe importer for importing from YAML files.
The YAML file needs to have a structure compatible to the actual recipe, such that the dict created from reading the YAML file can be directly fed into the
aspecd.tasks.Recipe.from_dict()
method.The order of entries of the YAML file is preserved due to using ordered dictionaries (
collections.OrderedDict
) internally.- Parameters
source (
str
) – filename of a YAML file to read from
- import_into(recipe=None)¶
Perform the actual import into the given recipe.
If no recipe is provided at method call, but is set as property in the importer object, the
aspecd.tasks.Recipe.import_from()
method of the recipe will be called.If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.
The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the importer object is not necessary.
The actual import should be implemented within the non-public method
_import()
.- Parameters
recipe (
aspecd.tasks.Recipe
) – recipe to import into- Raises
aspecd.io.MissingRecipeError – Raised if no recipe is provided.
- class aspecd.io.RecipeYamlExporter(target='')¶
Bases:
aspecd.io.RecipeExporter
Recipe exporter for exporting to YAML files.
The YAML file will have a structure corresponding to the output of the
aspecd.tasks.Recipe.to_dict()
method of the recipe object.- Parameters
target (
str
) – filename of a YAML file to write to
- export_from(recipe=None)¶
Perform the actual export from the given recipe.
If no recipe is provided at method call, but is set as property in the exporter object, the
aspecd.tasks.Recipe.export_to()
method of the recipe will be called.If no recipe is provided at method call nor as property in the object, the method will raise a respective exception.
The recipe object always calls this method with the respective recipe as argument. Therefore, in this case setting the recipe property within the exporter object is not necessary.
The actual export should be implemented within the non-public method
_export()
.- Parameters
recipe (
aspecd.tasks.Recipe
) – Recipe to export from- Raises
aspecd.io.MissingRecipeError – Raised if no recipe is provided.
- class aspecd.io.AdfExporter(target=None)¶
Bases:
aspecd.io.DatasetExporter
Dataset exporter for exporting to ASpecD dataset format.
The ASpecD dataset format is vaguely reminiscent of the Open Document Format, i.e. a zipped directory containing structured data (in this case in form of a YAML file) and binary data in a corresponding subdirectory.
As PyYAML is not capable of dealing with NumPy arrays out of the box, those are dealt with separately. Small arrays are stored inline as lists, larger arrays in separate files. For details, see the
aspecd.utils.Yaml
class.The data format tries to be as self-contained as possible, using standard file formats and a brief description of its layout contained within the archive. Collecting the contents in a single ZIP archive allows the user to deal with a single file for a dataset, while more advanced users can easily dig into the details and write importers for other platforms and programming languages, making the format rather platform-independent and future-safe. Due to using binary representation for larger numerical arrays, the format should be more memory-efficient than other formats.
- create_history_record()¶
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.export_to()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly.- Returns
history_record – history record for export step
- Return type
New in version 0.9.
- export_from(dataset=None)¶
Perform the actual export from the given dataset.
If no dataset is provided at method call, but is set as property in the exporter object, the
aspecd.dataset.Dataset.export_to()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.
The actual export is implemented within the non-public method
_export()
that gets automatically called.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to export data and metadata from- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- class aspecd.io.AdfImporter(source=None)¶
Bases:
aspecd.io.DatasetImporter
Dataset importer for importing from ASpecD dataset format.
For more details of the ASpecD dataset format, see the
aspecd.io.AdfExporter
class.- import_into(dataset=None)¶
Perform the actual import into the given dataset.
If no dataset is provided at method call, but is set as property in the importer object, the
aspecd.dataset.Dataset.import_from()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.
The actual import should be implemented within the non-public method
_import()
.Note
A number of parameters of the dataset are automatically assigned after calling out to the non-public method
aspecd.io.DatasetImporter._import()
, namely the non-public property_origdata
of the dataset is populated with a copy ofaspecd.dataset.Dataset.data
, and id and label are set toaspecd.io.DatasetImporter.source
.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to import data and metadata into- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- class aspecd.io.AsdfExporter(target=None)¶
Bases:
aspecd.io.DatasetExporter
Dataset exporter for exporting to Advanced Scientific Data Format (ASDF).
For more information on ASDF, see the homepage of the asdf package, and its format specification.
- create_history_record()¶
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.export_to()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly.- Returns
history_record – history record for export step
- Return type
New in version 0.9.
- export_from(dataset=None)¶
Perform the actual export from the given dataset.
If no dataset is provided at method call, but is set as property in the exporter object, the
aspecd.dataset.Dataset.export_to()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.
The actual export is implemented within the non-public method
_export()
that gets automatically called.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to export data and metadata from- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- class aspecd.io.AsdfImporter(source=None)¶
Bases:
aspecd.io.DatasetImporter
Dataset importer for importing from Advanced Scientific Data Format (ASDF).
For more information on ASDF, see the homepage of the asdf package, and its format specification.
- import_into(dataset=None)¶
Perform the actual import into the given dataset.
If no dataset is provided at method call, but is set as property in the importer object, the
aspecd.dataset.Dataset.import_from()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.
The actual import should be implemented within the non-public method
_import()
.Note
A number of parameters of the dataset are automatically assigned after calling out to the non-public method
aspecd.io.DatasetImporter._import()
, namely the non-public property_origdata
of the dataset is populated with a copy ofaspecd.dataset.Dataset.data
, and id and label are set toaspecd.io.DatasetImporter.source
.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to import data and metadata into- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- class aspecd.io.TxtImporter(source=None)¶
Bases:
aspecd.io.DatasetImporter
Dataset importer for importing from plain text files (TXT).
Plain text files have often the disadvantage of no accompanying metadata, therefore the use of plain text files for data storage is highly discouraged, besides other problems like inherent low resolution/accuracy or otherwise large file sizes.
The main reason for this class to exist is that it provides a simple way to showcase ASpecD functionality reading from primitive data sources. Besides that, sometimes you will encounter plain text files.
Note
The importer relies on
numpy.loadtxt()
for reading text files. Hence, you can use any parameters understood by this function as keys in theparameters
attribute. For handling decimal separators (non-standard behaviour), see below.If your data consist of two columns, the first will automatically be interpreted as the x axis. In all other cases, data will be read as is and no axes values explicitly written.
- parameters¶
Parameters controlling the import
- skiprows
int
Number of rows to skip in text file (e.g., header lines)
- delimiter
str
The string used to separate values.
Default: None (meaning: whitespace)
- comments
str
|list
Characters or list of characters indicating the start of a comment.
Default: #
- separator
str
Character used as decimal separator.
Default: None (meaning: dot)
- Type
- skiprows
Decimal separators
Handling decimal separators other than the dot is notoriously difficult, though often necessary. For this, the non-standard key
separator
has been introduced that is not supported by numpy itself. If you specify a character using this parameter, the file will be read as text and the character specified replaced by the dot. Only afterwards willnumpy.loadtxt()
be used with all the other parameters as usual.Changed in version 0.6.3: Document more parameter keys; add handling of decimal separator.
- import_into(dataset=None)¶
Perform the actual import into the given dataset.
If no dataset is provided at method call, but is set as property in the importer object, the
aspecd.dataset.Dataset.import_from()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the importer object is not necessary.
The actual import should be implemented within the non-public method
_import()
.Note
A number of parameters of the dataset are automatically assigned after calling out to the non-public method
aspecd.io.DatasetImporter._import()
, namely the non-public property_origdata
of the dataset is populated with a copy ofaspecd.dataset.Dataset.data
, and id and label are set toaspecd.io.DatasetImporter.source
.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to import data and metadata into- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.
- class aspecd.io.TxtExporter(target=None)¶
Bases:
aspecd.io.DatasetExporter
Dataset exporter for exporting to plain text files (TXT).
Plain text files have often the disadvantage of no accompanying metadata, therefore the use of plain text files for data storage is highly discouraged, besides other problems like inherent low resolution/accuracy or otherwise large file sizes.
Warning
All metadata contained within a dataset (including the full history) are lost when exporting to plain text. Therefore, using this exporter will usually result in you loosing reproducibility. Hence, better think twice before using this exporter and use entirely on your own risk and only if you really know what you are doing (and why).
The main reason for this class to exist is that sometimes there is a need to export data to a simple exchange format that can be shared with collaboration partners.
Note
The importer relies on
numpy.savetxt()
for writing text files. Hence, the same limitations apply, e.g. only working for 1D and 2D data, but not data with more than two dimensions.In case of 1D data, the resulting file will consist of two columns, with the first column consisting of the axis values and the second column containing the actual data. An example of the contents of such a file are given below:
3.400000000000000000e+02 6.340967862812832978e-01 3.410000000000000000e+02 3.424209074593306257e-01 3.420000000000000000e+02 1.675116805484100357e-02
In case of 2D data, the resulting file will contain the axes values in the first row/column respectively. Hence, the size of the matrix will be +1 in both directions compared to the size of the actual data and the first element (top left) will always be zero (and shall be ignored). An example of the contents of such a file are given below:
0.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 3.400000000000000000e+02 6.340967862812832978e-01 5.979077980106655144e-01 3.410000000000000000e+02 3.424209074593306257e-01 1.052868239245914328e-01 3.420000000000000000e+02 1.675116805484100357e-02 9.050894282755458375e-01
These two examples show immediately two of the problems of this file format: You are left to guess the quantity and unit of each the axes, and these files get quite big, as many decimal places are stored to not loose numerical resolution. With 50 characters per line for a 1D dataset (translating to at least one byte each), you end up with 50 kB for 1000 values.
- create_history_record()¶
Create history record to be added to the dataset.
Usually, this method gets called from within the
aspecd.dataset.export_to()
method of theaspecd.dataset.Dataset
class and ensures the history of each processing step to get written properly.- Returns
history_record – history record for export step
- Return type
New in version 0.9.
- export_from(dataset=None)¶
Perform the actual export from the given dataset.
If no dataset is provided at method call, but is set as property in the exporter object, the
aspecd.dataset.Dataset.export_to()
method of the dataset will be called.If no dataset is provided at method call nor as property in the object, the method will raise a respective exception.
The dataset object always calls this method with the respective dataset as argument. Therefore, in this case setting the dataset property within the exporter object is not necessary.
The actual export is implemented within the non-public method
_export()
that gets automatically called.- Parameters
dataset (
aspecd.dataset.Dataset
) – Dataset to export data and metadata from- Raises
aspecd.io.MissingDatasetError – Raised if no dataset is provided.