Scilab Home page | Wiki | Bug tracker | Forge | Mailing list archives | ATOMS | File exchange
Please login or create an account
Change language to: English - Português - 日本語 - Русский
Aide de Scilab >> Tableur > csvRead

csvRead

Read comma-separated value file

Syntax

M = csvRead(filename)
M = csvRead(filename, separator)
M = csvRead(filename, separator, decimal)
M = csvRead(filename, separator, decimal, conversion)
M = csvRead(filename, separator, decimal, conversion, substitute)
M = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range)
M = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range, header)
[M, comments] = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range, header)

Arguments

filename

a single string: the file path.

separator

a single string: the field separator used.

decimal

a character: the decimal symbol used in numbers (dot or comma).

If decimal is different of [] and conversion is set to string, the decimal conversion will be done.

conversion

"string" or "double" (by default): Specifies the type of the output M.

Note that read_csv has "string" as default.

substitute

a m-by-2 matrix of strings, a replacing map (default = [], meaning no replacements). The first column substitute(:,1) contains the searched strings and the second column substitute(:,2) contains the replace strings. Every occurrence of a searched string in the file is replaced.

regexpcomments

a string specifying a regular expression (default: []). Lines of the file matching the expression are considered as comments.

The same character is expected as delimiter at the beginning and at the end of the string. If it is needed in the expression's body, it must be escaped with "\" (Here is a good online page explaining the syntax of regular expressions).

range

[firstRow firstColumn lastRow lastColumn] row vector of floating point integers, with lastRow ≤ 2e9 and lastColumn ≤ 2e9: the range of rows and columns selecting the block of data to be returned. Default [], meaning all data. The block is selected on actual data, after removing the header and/or commented rows (when requested).

To select all rows or/and columns starting from [firstRow, firstCol], while the total numbers of rows or columns are unknown, lastRow=2e9 or/and lastColumn=2e9 can be specified (the actual limit is 231).
header

a floating point integer: the number of lines to be skipped at the beginning of the file, before reading actual data.

M

a m-by-n matrix of strings or numbers. Complex numbers are supported.

comments

a column vector of strings: Lines of text matching regexpcomments.

Description

Given an ascii file with comma separated values delimited fields, this function returns the corresponding Scilab matrix of strings or doubles.

For example, the .csv data file may have been created by a spreadsheet software using "Text and comma" format.

It might happen that the columns are separated by a non-comma separator. In this case, use csvRead(filename, separator) for another choice of separator.

The default value of the optional input arguments are defined by the csvDefault function.

Any optional input argument equal to the empty matrix [] is set to its default value.

When the input argument "conversion" is equal to "double", the non-numeric fields within the .csv (e.g. strings) are converted into NaN.

csvRead is able to handle both UTF-8 and ASCII text files.

Examples

The following script presents some basic uses of the csvRead function.

// Create a file with some data separated with tabs.
M = 1:50;
filename = fullfile(TMPDIR, "data.csv");
csvWrite(M, filename, ascii(9), '.');

// read csv file
M1 = csvRead(filename,ascii(9), [], 'string')

// Returns a double
M2 = csvRead(filename,ascii(9), '.', 'double')

// Compares original data and result.
and(M == M2)

// Use the substitude argument to manage
// special data files.
content = [
"1"
"Not-A-Number"
"2"
"Not-A-Number"
];

substitute = [
"Not-A-Number" "Nan"
];

mputl(content,filename);
M = csvRead(filename,",",".","double",substitute)
isnan(M(2,1)) // Expected=%t
isnan(M(4,1)) // Expected=%t

The following script presents more practical uses of the csvRead function.

// Define a matrix of strings
Astr = [
"1" "8" "15" "22" "29" "36" "43" "50"
"2" "9" "16" "23" "30" "37" "44" "51"
"3" "10" "17" "6+3*I" "31" "38" "45" "52"
"4" "11" "18" "25" "32" "39" "46" "53"
"5" "12" "19" "26" "33" "40" "47" "54"
"6" "13" "20" "27" "34" "41" "48" "55"
"+0" "-0" "Inf" "-Inf" "Nan" "1.D+308" "1.e-308" "1.e-323"
];

// Create a file with some data separated with commas
filename = fullfile(TMPDIR , 'foo.csv');
sep = ",";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
        mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);
// To see the file: edit(filename)

// Read this file
Bstr = csvRead ( filename )

// Create a file with a particular separator: here ";"
filename = fullfile(TMPDIR , 'foo.csv');
sep = ";";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
        mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);

//
// Read the file and customize the separator
csvRead ( filename , sep )

The following script shows how to remove lines with regexp argument of the csvRead function.

CSV = ["// tata"; ..
"1,0,0,0"; ..
"// titi"; ..
"0,1,0,0"; ..
" //  toto"; ..
"0,0,1,0"; ..
" tutu // tata"];
filename = fullfile(TMPDIR , 'foo.csv');
mputl(CSV, filename);

// Ignore all lines including "//" and return them as comments:
[M, comments] = csvRead(filename, [], [], [], [], '!//!')
--> [M, comments] = csvRead(filename, [], [], [], [], '!//!')
 M  =
   1.   0.   0.   0.
   0.   1.   0.   0.
   0.   0.   1.   0.

 comments  =
  "// tata"
  "// titi"
  " //  toto"
  " tutu // tata"

Empty field are managed by csvRead

csvWrite(['1','','3';'','','6'], TMPDIR + "/example.csv")
csvRead(TMPDIR + "/example.csv", [], [], "string")
csvRead(TMPDIR + "/example.csv", [], [], "double")

// Define a matrix of strings
Astr = [
"1" "8" "15" "22" "29" "36" "43" "50"
"2" "9" "16" "23" "30" "37" "44" "51"
"3" "10" "17" "6+3*I" "31" "38" "45" "52"
"4" "11" "18" "25" "32" "39" "46" "53"
"5" "12" "19" "26" "33" "40" "47" "54"
"6" "13" "20" "27" "34" "41" "48" "55"
"+0" "-0" "Inf" "-Inf" "Nan" "1.D+308" "1.e-308" "1.e-323"
];

// Create a file with some data separated with commas
filename = fullfile(TMPDIR , 'foo.csv');
sep = ",";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
    mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);
// To see the file: edit(filename)

// Read this file
Bstr = csvRead ( filename )

// Create a file with a particular separator: here ";"
filename = fullfile(TMPDIR , 'foo.csv');
sep = ";";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
    mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);
//
// Read the file and customize the separator
csvRead ( filename , sep )

In the following script, the file "filename" is read by blocks of 5000 rows. The algorithm stops when the number of rows actually read from the file differ from 5000, i.e. when the end of the file has been reached.

blocksize = 5000;
C1 = 1;
C2 = 3;
iblock = 1
while %t
    R1 = (iblock-1) * blocksize + 1;
    R2 = blocksize + R1-1;
    irange = [R1 C1 R2 C2];
    mprintf("\nBlock #%d, rows #%d to #%d\n",iblock,R1,R2);
    tic();
    M=csvRead(filename , [] , [] , [] , [] , [] , irange );
    t = toc();
    nrows = size(M,"r");
    ncols = size(M,"c");
    if ( nrows > 0 ) then
        p = t/(nrows*ncols)*1.e6;
        mprintf("  Actual #rows=%d\n",nrows);
        mprintf("  T=%.3f (s)\n",t);
        mprintf("  T=%.1f (ms/cell)\n",p);
    end
    if ( nrows < blocksize ) then
        mprintf("... End of the file.\n");
        break
    end
    iblock = iblock + 1;
end

This produces:

Block #1, rows #1 to #5000
Actual #rows=5000
T = 3.135 (s)
T = 209.0 (ms/cell)

Block #2, rows #5001 to #10000
Actual #rows=5000
T = 3.139 (s)
T = 209.3 (ms/cell)

Block #3, rows #10001 to #15000
Actual #rows=5000
T = 3.151 (s)
T = 210.1 (ms/cell)

etc....

Example with range

CSV = ["1,0,0,0,0"; ..
"0,1,0,0,0"; ..
"0,0,1,0,0"; ..
"4,4,1,2,0"; ..
"4,63,1,2,0"; ..
"4,63,1,4,233"; ..
"42,3,23,2,23"; ..
];
filename = fullfile(TMPDIR , 'foo.csv');
mputl(CSV, filename);
// Extract a subset of the csv file
csvRead(filename, [], [], "double", [], [], [5 3 7 6])

Example with header

comments = [
"// Copyright (C) INRIA"];
filename = fullfile(TMPDIR , 'foo.csv');
csvWrite(rand(2,3), filename, ascii(9), ",", [], comments);

header = 2;
[M, c] = csvRead(filename, ascii(9), ",", "double", [], [], [], header) // Ignore the two first lines (the header)

See also

History

VersionDescription
5.4.0 Function introduced. Based on the 'csv_readwrite' module. The only difference in the behavior compared to read_csv is that csvRead will try to convert value to double by default when read_csv will return value as string.
5.4.1 If decimal is different of [] and conversion is set to string, the decimal conversion will be done.
5.5 "header" input argument added.
Scilab Enterprises
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Mon Jan 03 14:33:08 CET 2022