save format

Abstract

The goal of this document is to specify the HDF5 format used by Scilab to store its data.

The format is called SOD for Scilab Open Data.

The first public release of SOD has been done with Scilab 5.4.0.

Rationale

Interoperability is one of the key aspects of modern software. In order to improve more and more this aspect, a standard definition of the HDF5 format is proposed in this SEP.

Since Scilab 5.2.0, an export / import capability has been developed and maintain to exchange data. This is already one of the base components of Xcos to store and exchange data.

Supported data types

All Scilab data types are supported. For example:

A=32;
b=[32,2];
c=[2,2;3,4];
d=rand(10,10);

a="my string";
b=["string 1";"my string 2"];

a=%t;
b=[%t, %f];

Name	Example in Scilab
double
string
boolean
integer	int8([1 -120 127 312]) x=int32(-200:100:400)
polynomial	s=poly([2 3],"s") poly(1:4,'s','c')
sparse	sp=sparse([1,2;4,5;3,10],[1,2,3])
boolean sparse	dense=[%F, %F, %T, %F, %F %T, %F, %F, %F, %F %F, %F, %F, %F, %F %F, %F, %F, %F, %T]; sp=sparse(dense)
list	l = list(1,["a" "b"])
tlist	t = tlist(["listtype","field1","field2"], [], []);
mlist	M=mlist(['V','name','value'],['a','b';'c' 'd'],[1 2; 3 4]);

Several "types" are based on tlist or mlist. It is the case of rational, state-space, cell and struct. They are therefore transparently saved.

voidand undefinedare two specific elements created to manage special cases in the list management. They are described later in this document.

HDF5 File Structure

Scilab HDF5 architecture is pretty straightforward.

General

For each Scilab variable, a dataset at the root position is declared. The name of the dataset is the name of Scilab variable.

Example, the following code:

emptyuint32matrix = uint32([]);
uint32scalar = uint32(1);
uint32rowvector = uint32([1 4 7]);
uint32colvector = uint32([1;4;7]);
uint32matrix = uint32([1 4 7;9 6 3]);
save("uint32.sod","emptyuint32matrix","uint32scalar","uint32rowvector","uint32colvector","uint32matrix");

produces:

Each root dataset has an attribute called SCILAB_Class. This attribute defines which types is the variable stored in the HDF5 file.

If the variable is a primitive type and without complex values associated, data are stored directly into the dataset. Otherwise, the dataset contains references to the actual data.

Every SOD file contains two specific variables:

SCILAB_scilab_version – Describe which version of Scilab has been used to save the SOD file.

For example, with Scilab 5.4.0, the data will be:

SCILAB_scilab_version = scilab-5.4.0
SCILAB_sod_version – Describe which version of the SOD specification has been used to save the file.

For example, with Scilab 5.4.0, the data will be:

SCILAB_sod_version = 2

Types where data are stored straight into the dataset.

Scilab Type	HDF5 Scilab type attribute	HDF5 attributes	HDF data types mapping
string	SCILAB_Class = string		String
boolean	SCILAB_Class = boolean		32-bit integer
integer	SCILAB_Class = integer	SCILAB_precision = {8, 16, 32, u8, u16, u32}	8 = 8-bit character 16 = 16-bit integer 32 = 32-bit integer u8 = 8-bit unsigned character u16 = 16-bit unsigned integer u32 = 32-bit unsigned integer

For these types, like in Scilab, the data are stored in a one dimension array. Data are stored by column wise.

To reconstruct the matrix, vector or scalar, two attributes provides the number of columns and rows.

Since the 5.4.0 release of Scilab and SOD v2, SCILAB_cols and SCILAB_rows are no longer used for matrices of double, integer, polynomial and string. SOD uses the native multidimensional HDF5 feature.

Example

The saving of the declaration: int32([1 -4 7;-9 6 -3]) will be displayed as:

in hdfview.

And the metadata will be:

int32matrix (800, 2)

32-bit integer, 3 x 2 => the size of the variable

Number of attributes = 2

SCILAB_Class = integer

SCILAB_precision = 32

Scalar value are stored as a matrix of size 1 by 1.

An empty variable ([]) will have the attribute SCILAB_empty set to true.

Types where data are stored in a dedicated group

Many of Scilab datatypes are stored using groups. This allows a clear separations of the value but also an easy access.

Groups are named from the variables enclosed by "#". For example, for a matrix of double called matrixofdouble, the name of the root dataset will be matrixofdouble, the name of the associated group will be #matrixofdouble#.

For recursive data type (list, mlist, tlist, etc), names of subgroup are constructed the following way:

The # allows the creation of an unique identifier. The number of initial # shows the level of depth. Therefore, the sublist ###listnested#_#2##_#1## will indicate that it is located at the second level.

The underscore "_" is a way to represent the depth. Usually, the "/" character is used in such case but it is a reserved keyword in the HDF5 specification.

The integers used in the naming shows the position in the data structure, both in term of position in the current structure but also regarding the parent element. In the example, ###listnested#_#2##_#1##, the 1 shows that it is dealing with the second element of the third structure of the main element (elements are indexed from 0).

For example, the group named ###listnested#_#2##_#1##, will point to the value [32, 42] from the example:

listnested=list(2,%i,'f',ones(3,3))
listnested(3) = list( %t, [32,42]);

Sparse

Scilab type: sparse

HDF5 Scilab type attribute: SCILAB_Class = sparse

HDF5 attributes:

SCILAB_rows = <int>

Number of rows

SCILAB_cols = <int>

Number of columns

SCILAB_items = <int>

Define the number of elements in the sparse matrix

Root dataset values:

First value (#0#): Each element of this data structure shows the number of non-null element per line. Therefore, the first element shows the number of element in the first line of the sparse matrix.

Second value (#1#): Provides the position of the column of each elements of the sparse matrix.

Third value (#2#): Stores the reference to the actual values of the element in the sparse matrix (which will be stored in a specific group).

Example, taking this matrix:

0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 3.
0. 0. 0. 0. 2. 0. 0. 0. 0. 0.

which is generated by the function:

sparse([1,2;4,5;3,10],[1,2,3])

will have:

#0# contains 1;0;1;1

#1# contains 2;10;5

#2# references a matrix of double (not complex in this example) which contains 1.0; 3.0; 2.0

Boolean sparse

Scilab type: boolean sparse

HDF5 Scilab type attribute: SCILAB_Class = boolean sparse

HDF5 attributes:

SCILAB_rows = <int>

Number of rows

SCILAB_cols = <int>

Number of columns

SCILAB_items = <int>

Define the number of elements in the sparse matrix

Root dataset values: While a sparse has 3 datasets, the boolean sparse has only 2 because defined values are automatically considered as true.

First value (#0#): Each element of this data structure shows the number of non-null element per line.

Therefore, the first element shows the number of element in the first line of the sparse matrix.

Second value (#1#): Provides the position of the column of each elements of the sparse matrix.

With the boolean sparse matrix:

dense=[%F, %F, %T, %F, %F
%T, %F, %F, %F, %F
%F, %F, %F, %F, %F
%F, %F, %F, %F, %T];

#0# contains 1;1;0;1.

#1# contains 3;1;5.

Only the two information are necessary to recreate the boolean sparse.

HDF data types mapping:

32-bit integer

Double

Scilab type: double

HDF5 Scilab type attribute: SCILAB_Class = double

Root dataset values: Both real and complex values are stored in a group called #<variable name>#.

First value: Reference to the real values. Named #0#.

If the matrix is complex, the second value will reference the complex values. Named #1#.

HDF data types mapping: 64-bit floating-point

Polynomial

Scilab type: polynomial

HDF5 Scilab type attribute: SCILAB_Class = polynomial

HDF5 attributes:

SCILAB_Class = polynomial

SCILAB_varname = <string>

The symbolic variable name

SCILAB_Complex = <boolean>

If the polynomial is complex (not set if false)

Root dataset values:

Coefficients are stored under the form of a matrix of double (cf the relative section to double storage). It is interesting to note that coefficients can be complex and, therefore, be stored as a matrix of complex. Rules of naming of the (sub-)groups and dataset are described at the beginning of the chapter.

HDF data types mapping: Object reference

list

Scilab type: list

HDF5 Scilab type attribute:

SCILAB_Class = list

HDF5 attributes: SCILAB_items = <number of items in the list>

Root dataset values:

Associated to the root dataset, values stored in this dataset are the references to the values stored in the list. The values are stored in the group called #<variable name>#. In the #<variable name># group, data can be any type. They are included straight into the group. Their representations are the same as in other cases, based in recursive structure (meaning that list of list of list of various types can stored and loaded).

Rules of naming of the (sub-)groups and dataset are described at the beginning of the chapter.

HDF data types mapping: Object reference

tlist

Scilab type: tlist

HDF5 Scilab type attribute:

SCILAB_Class = tlist

HDF5 attributes: cf list

mlist

Scilab type: mlist

HDF5 Scilab type attribute:

SCILAB_Class = mlist

HDF5 attributes: cf list

void

Scilab type: void

HDF5 Scilab type attribute:

SCILAB_Class = void

A void value can only be found in very special usages of list, tlist and mlist. It can be created with the following syntax:

voidelement_ref=list(1,,3);

undefined

Scilab type: undefined

HDF5 Scilab type attribute:

SCILAB_Class = undefined

An undefined value is generated when the size of a list is increased and some elements not defined. They will be generated with the syntax:

undefinedelement_ref=list(2,%i,'f',ones(3,3));
undefinedelement_ref(6)="toto"

Real life examples

Sample files of all these variables are provided into the Scilab distribution. They are available in the directory: SCI/modules/hdf5/tests/sample_scilab_data/

At the date of redaction of this document, the following files are provided with the Scilab distribution:

booleanmatrix.sod

booleanscalar.sod

booleansparse.sod

emptymatrix.sod

emptysparse.sod

hypermatrixcomplex.sod

hypermatrix.sod

int16.sod

int32.sod

int8.sod

listnested.sod

list.sod

matricedoublecomplexscalar.sod

matricedoublecomplex.sod

matricedoublescalar.sod

matricedouble.sod

matricestringscalar.sod

matricestring.sod

mlist.sod

polynomialscoef.sod

polynomials.sod

sparsematrix.sod

tlist.sod

uint16.sod

uint32.sod

uint8.sod

undefinedelement.sod

voidelement.sod

Format evolutions

SOD version	Scilab version	Description
0	5.2.0	Initial version of the Scilab/HDF5 format
1	5.4.0 alpha / beta	Default format for load and save Previous format (.bin) still supported
2	5.4.0	For matrices of double, integer, polynomial and string SCILAB_cols / SCILAB_rows have been removed to use multidimensional HDF5
3	6.0.0	.bin support dropped.

Report an issue
<< mtell	Files : Input/Output functions	scanf >>

Abstract

Rationale

Supported data types

HDF5 File Structure

Real life examples

Format evolutions

See also