save format
format of files produced by "save"
Abstract
The goal of this document is to specify the HDF5 format used by Scilab to store its data.
The format is called SOD for Scilab Open Data.
The first public release of SOD has been done with Scilab 5.4.0.
Rationale
Interoperability is one of the key aspects of modern software. In order to improve more and more this aspect, a standard definition of the HDF5 format is proposed in this SEP.
Since Scilab 5.2.0, an export / import capability has been developed and maintain to exchange data. This is already one of the base components of Xcos to store and exchange data.
Supported data types
All Scilab data types are supported. For example:
A=32; b=[32,2]; c=[2,2;3,4]; d=rand(10,10);
a="my string"; b=["string 1";"my string 2"];
a=%t; b=[%t, %f];
Name | Example in Scilab |
double | |
string | |
boolean | |
integer | |
polynomial | |
sparse | sp=sparse([1,2;4,5;3,10],[1,2,3]) |
boolean sparse | dense=[%F, %F, %T, %F, %F %T, %F, %F, %F, %F %F, %F, %F, %F, %F %F, %F, %F, %F, %T]; sp=sparse(dense) |
list | l = list(1,["a" "b"]) |
tlist | t = tlist(["listtype","field1","field2"], [], []); |
mlist | M=mlist(['V','name','value'],['a','b';'c' 'd'],[1 2; 3 4]); |
Several "types" are based on tlist or mlist. It is the case of rational, state-space, cell and struct. They are therefore transparently saved.
voidand undefinedare two specific elements created to manage special cases in the list management. They are described later in this document.
HDF5 File Structure
Scilab HDF5 architecture is pretty straightforward.
GeneralFor each Scilab variable, a dataset at the root position is declared. The name of the dataset is the name of Scilab variable.
Example, the following code:
emptyuint32matrix = uint32([]); uint32scalar = uint32(1); uint32rowvector = uint32([1 4 7]); uint32colvector = uint32([1;4;7]); uint32matrix = uint32([1 4 7;9 6 3]); save("uint32.sod","emptyuint32matrix","uint32scalar","uint32rowvector","uint32colvector","uint32matrix");
produces:
Each root dataset has an attribute called SCILAB_Class
. This attribute defines which types is the variable stored in the HDF5 file.
If the variable is a primitive type and without complex values associated, data are stored directly into the dataset. Otherwise, the dataset contains references to the actual data.
Every SOD file contains two specific variables:
SCILAB_scilab_version
– Describe which version of Scilab has been used to save the SOD file.For example, with Scilab 5.4.0, the data will be:
SCILAB_scilab_version = scilab-5.4.0
SCILAB_sod_version – Describe which version of the SOD specification has been used to save the file.
For example, with Scilab 5.4.0, the data will be:
SCILAB_sod_version = 2
Types where data are stored straight into the dataset.
Scilab Type | HDF5 Scilab type attribute | HDF5 attributes | HDF data types mapping |
string |
SCILAB_Class = string |
String | |
boolean |
SCILAB_Class = boolean |
32-bit integer | |
integer |
SCILAB_Class = integer |
SCILAB_precision = {8, 16, 32, u8, u16, u32} |
8 = 8-bit character 16 = 16-bit integer 32 = 32-bit integer u8 = 8-bit unsigned character u16 = 16-bit unsigned integer u32 = 32-bit unsigned integer |
For these types, like in Scilab, the data are stored in a one dimension array. Data are stored by column wise.
To reconstruct the matrix, vector or scalar, two attributes provides the number of columns and rows.
Since the 5.4.0 release of Scilab and SOD v2, SCILAB_cols
and SCILAB_rows
are no longer used for matrices of double, integer, polynomial and string. SOD uses the native multidimensional HDF5 feature.
Example
The saving of the declaration: int32([1 -4 7;-9 6 -3])
will be displayed as:
in hdfview.
And the metadata will be:
int32matrix (800, 2)
32-bit integer, 3 x 2 => the size of the variable
Number of attributes = 2
SCILAB_Class = integer
SCILAB_precision = 32
Scalar value are stored as a matrix of size 1 by 1. |
An empty variable ([]
) will have the attribute SCILAB_empty
set to true.
Types where data are stored in a dedicated group
Many of Scilab datatypes are stored using groups. This allows a clear separations of the value but also an easy access.
Groups are named from the variables enclosed by "#". For example, for a matrix of double called matrixofdouble, the name of the root dataset will be matrixofdouble, the name of the associated group will be #matrixofdouble#.
For recursive data type (list, mlist, tlist, etc), names of subgroup are constructed the following way:
The #
allows the creation of an unique identifier. The number of initial #
shows the level of depth. Therefore, the sublist ###listnested#_#2##_#1## will indicate that it is located at the second level.
The underscore "_" is a way to represent the depth. Usually, the "/" character is used in such case but it is a reserved keyword in the HDF5 specification.
The integers used in the naming shows the position in the data structure, both in term of position in the current structure but also regarding the parent element. In the example, ###listnested#_#2##_#1##, the 1 shows that it is dealing with the second element of the third structure of the main element (elements are indexed from 0).
For example, the group named ###listnested#_#2##_#1##, will point to the value [32, 42] from the example:
Sparse
Scilab type: sparse
HDF5 Scilab type attribute: SCILAB_Class = sparse
HDF5 attributes:
SCILAB_rows = <int>
Number of rows
SCILAB_cols = <int>
Number of columns
SCILAB_items = <int>
Define the number of elements in the sparse matrix
Root dataset values:
First value (#0#
): Each element of this data structure shows the number of non-null element per line. Therefore, the first element shows the number of element in the first line of the sparse matrix.
Second value (#1#
): Provides the position of the column of each elements of the sparse matrix.
Third value (#2#
): Stores the reference to the actual values of the element in the sparse matrix (which will be stored in a specific group).
Example, taking this matrix:
0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 3. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0.
which is generated by the function:
sparse([1,2;4,5;3,10],[1,2,3])
will have:
#0# contains 1;0;1;1
#1# contains 2;10;5
#2# references a matrix of double (not complex in this example) which contains 1.0; 3.0; 2.0
Boolean sparse
Scilab type: boolean sparse
HDF5 Scilab type attribute: SCILAB_Class = boolean sparse
HDF5 attributes:
SCILAB_rows = <int>
Number of rows
SCILAB_cols = <int>
Number of columns
SCILAB_items = <int>
Define the number of elements in the sparse matrix
Root dataset values: While a sparse has 3 datasets, the boolean sparse has only 2 because defined values are automatically considered as true.
First value (#0#
): Each element of this data structure shows the number of non-null element per line.
Therefore, the first element shows the number of element in the first line of the sparse matrix.
Second value (#1#
): Provides the position of the column of each elements of the sparse matrix.
With the boolean sparse matrix:
dense=[%F, %F, %T, %F, %F %T, %F, %F, %F, %F %F, %F, %F, %F, %F %F, %F, %F, %F, %T];
#0# contains 1;1;0;1.
#1# contains 3;1;5.
Only the two information are necessary to recreate the boolean sparse.
HDF data types mapping:
32-bit integer
Double
Scilab type: double
HDF5 Scilab type attribute: SCILAB_Class = double
Root dataset values: Both real and complex values are stored in a group called #<variable name>#
.
First value: Reference to the real values. Named #0#
.
If the matrix is complex, the second value will reference the complex values. Named #1#
.
HDF data types mapping: 64-bit floating-point
Polynomial
Scilab type: polynomial
HDF5 Scilab type attribute: SCILAB_Class = polynomial
HDF5 attributes:
SCILAB_Class = polynomial
SCILAB_varname = <string>
The symbolic variable name
SCILAB_Complex = <boolean>
If the polynomial is complex (not set if false)
Root dataset values:
Coefficients are stored under the form of a matrix of double (cf the relative section to double storage). It is interesting to note that coefficients can be complex and, therefore, be stored as a matrix of complex. Rules of naming of the (sub-)groups and dataset are described at the beginning of the chapter.
HDF data types mapping: Object reference
list
Scilab type: list
HDF5 Scilab type attribute:
SCILAB_Class = list
HDF5 attributes: SCILAB_items = <number of items in the list>
Root dataset values:
Associated to the root dataset, values stored in this dataset are the references to the values stored in the list. The values are stored in the group called #<variable name>#
. In the #<variable name>#
group, data can be any type. They are included straight into the group. Their representations are the same as in other cases, based in recursive structure (meaning that list of list of list of various types can stored and loaded).
Rules of naming of the (sub-)groups and dataset are described at the beginning of the chapter.
HDF data types mapping: Object reference
tlist
Scilab type: tlist
HDF5 Scilab type attribute:
SCILAB_Class = tlist
HDF5 attributes: cf list
mlist
Scilab type: mlist
HDF5 Scilab type attribute:
SCILAB_Class = mlist
HDF5 attributes: cf list
void
Scilab type: void
HDF5 Scilab type attribute:
SCILAB_Class = void
A void value can only be found in very special usages of list, tlist and mlist. It can be created with the following syntax:
voidelement_ref=list(1,,3);
undefined
Scilab type: undefined
HDF5 Scilab type attribute:
SCILAB_Class = undefined
An undefined value is generated when the size of a list is increased and some elements not defined. They will be generated with the syntax:
Real life examples
Sample files of all these variables are provided into the Scilab distribution. They are available in the directory: SCI/modules/hdf5/tests/sample_scilab_data/
At the date of redaction of this document, the following files are provided with the Scilab distribution:
booleanmatrix.sod
booleanscalar.sod
booleansparse.sod
emptymatrix.sod
emptysparse.sod
hypermatrixcomplex.sod
hypermatrix.sod
int16.sod
int32.sod
int8.sod
listnested.sod
list.sod
matricedoublecomplexscalar.sod
matricedoublecomplex.sod
matricedoublescalar.sod
matricedouble.sod
matricestringscalar.sod
matricestring.sod
mlist.sod
polynomialscoef.sod
polynomials.sod
sparsematrix.sod
tlist.sod
uint16.sod
uint32.sod
uint8.sod
undefinedelement.sod
voidelement.sod
Format evolutions
SOD version | Scilab version | Description |
0 |
5.2.0 |
Initial version of the Scilab/HDF5 format |
1 |
5.4.0 alpha / beta |
Default format for load and save Previous format (.bin) still supported |
2 |
5.4.0 |
For matrices of double, integer, polynomial and string SCILAB_cols / SCILAB_rows have been removed to use multidimensional HDF5 |
3 | 6.0.0 |
.bin support dropped. |
See also
- save — Saves some chosen variables in a binary data file
- load — Loads some archived variables, a saved graphic figure, a library of functions
- listvarinfile — lists variables stored in a binary archive (names, types, sizes..)
- type — returns the type of a Scilab object
- typeof — explicit type or overloading code of an object
Comments
Add a comment:
Please login to comment this page.