Scilab Home page | Wiki | Bug tracker | Forge | Mailing list archives | ATOMS | File exchange
Please login or create an account
Change language to: English - Français - Português - 日本語
Справка Scilab >> Электронная таблица > csvRead

csvRead

Read comma-separated value file

Syntax

M = csvRead(filename)
M = csvRead(filename, separator)
M = csvRead(filename, separator, decimal)
M = csvRead(filename, separator, decimal, conversion)
M = csvRead(filename, separator, decimal, conversion, substitute)
M = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range)
M = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range, header)
[M, comments] = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range, header)

Arguments

filename

a 1-by-1 matrix of strings, the file path.

separator

a 1-by-1 matrix of strings, the field separator used.

decimal

a 1-by-1 matrix of strings, the decimal used.

If decimal is different of [] and conversion is set to string, the decimal conversion will be done.

conversion

a 1-by-1 matrix of strings, the type of the output M. Available values are "string" or "double" (by default).

Note that read_csv has "string" as default.

substitute

a m-by-2 matrix of strings, a replacing map (default = [], meaning no replacements). The first column substitute(:,1) contains the searched strings and the second column substitute(:,2) contains the replace strings. Every occurence of a searched string in the file is replaced.

regexpcomments

a string: a regexp to remove lines which match. (default: [])

range

a 1-by-4 matrix of floating point integers, the range of rows and columns which must be read (default range=[], meaning that all the rows and columns). Specify range using the format [Row1 Column1 Row2 Column2] where (R1,C1) is the upper left corner of the data to be read and (R2,C2) is the lower right corner.

Note that the file has to be correctly formated. The range will be done in the memory on the parsed elements.

header

a 1-by-1 matrix of floating point integers, the number of lines to be ignored at the beginning of the file.

M

a m-by-n matrix of strings or double.

comments

a m-by-n matrix of strings matched by regexp.

Description

Given an ascii file with comma separated values delimited fields, this function returns the corresponding Scilab matrix of strings or doubles.

For example, the .csv data file may have been created by a spreadsheet software using "Text and comma" format.

It might happen that the columns are separated by a non-comma separator. In this case, use csvRead(filename, separator) for another choice of separator.

The default value of the optional input arguments are defined by the csvDefault function.

Any optional input argument equal to the empty matrix [] is set to its default value.

When the input argument "conversion" is equal to "double", the non-numeric fields within the .csv (e.g. strings) are converted into NaN.

csvRead is able to handle both UTF-8 and ASCII text files.

Examples

The following script presents some basic uses of the csvRead function.

// Create a file with some data separated with tabs.
M = 1:50;
filename = fullfile(TMPDIR, "data.csv");
csvWrite(M, filename, ascii(9), '.');

// read csv file
M1 = csvRead(filename,ascii(9), [], 'string')

// Returns a double
M2 = csvRead(filename,ascii(9), '.', 'double')

// Compares original data and result.
and(M == M2)

// Use the substitude argument to manage
// special data files.
content = [
"1"
"Not-A-Number"
"2"
"Not-A-Number"
];

substitute = [
"Not-A-Number" "Nan"
];

mputl(content,filename);
M = csvRead(filename,",",".","double",substitute)
isnan(M(2,1)) // Expected=%t
isnan(M(4,1)) // Expected=%t

The following script presents more practical uses of the csvRead function.

// Define a matrix of strings
Astr = [
"1" "8" "15" "22" "29" "36" "43" "50"
"2" "9" "16" "23" "30" "37" "44" "51"
"3" "10" "17" "6+3*I" "31" "38" "45" "52"
"4" "11" "18" "25" "32" "39" "46" "53"
"5" "12" "19" "26" "33" "40" "47" "54"
"6" "13" "20" "27" "34" "41" "48" "55"
"+0" "-0" "Inf" "-Inf" "Nan" "1.D+308" "1.e-308" "1.e-323"
];

// Create a file with some data separated with commas
filename = fullfile(TMPDIR , 'foo.csv');
sep = ",";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
        mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);
// To see the file: edit(filename)

// Read this file
Bstr = csvRead ( filename )

// Create a file with a particular separator: here ";"
filename = fullfile(TMPDIR , 'foo.csv');
sep = ";";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
        mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);

//
// Read the file and customize the separator
csvRead ( filename , sep )

The following script shows how to remove lines with regexp argument of the csvRead function.

CSV = ["// tata"; ..
"1,0,0,0,0"; ..
"// titi"; ..
"0,1,0,0,0"; ..
"// toto"; ..
"0,0,1,0,0"; ..
"// tutu"];
filename = fullfile(TMPDIR , 'foo.csv');
mputl(CSV, filename);

// remove lines with // @ beginning
[M, comments] = csvRead(filename, [], [], [], [], '/\/\//')

Empty field are managed by csvRead

csvWrite(['1','','3';'','','6'], TMPDIR + "/example.csv")
csvRead(TMPDIR + "/example.csv", [], [], "string")
csvRead(TMPDIR + "/example.csv", [], [], "double")
// Define a matrix of strings
Astr = [
"1" "8" "15" "22" "29" "36" "43" "50"
"2" "9" "16" "23" "30" "37" "44" "51"
"3" "10" "17" "6+3*I" "31" "38" "45" "52"
"4" "11" "18" "25" "32" "39" "46" "53"
"5" "12" "19" "26" "33" "40" "47" "54"
"6" "13" "20" "27" "34" "41" "48" "55"
"+0" "-0" "Inf" "-Inf" "Nan" "1.D+308" "1.e-308" "1.e-323"
];

// Create a file with some data separated with commas
filename = fullfile(TMPDIR , 'foo.csv');
sep = ",";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
        mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);
// To see the file: edit(filename)

// Read this file
Bstr = csvRead ( filename )

// Create a file with a particular separator: here ";"
filename = fullfile(TMPDIR , 'foo.csv');
sep = ";";
fd = mopen(filename,'wt');
for i = 1 : size(Astr,"r")
        mfprintf(fd,"%s\n",strcat(Astr(i,:),sep));
end
mclose(fd);
//
// Read the file and customize the separator
csvRead ( filename , sep )

In the following script, the file "filename" is read by blocks of 5000 rows. The algorithm stops when the number of rows actually read from the file differ from 5000, i.e. when the end of the file has been reached.

blocksize = 5000;
C1 = 1;
C2 = 3;
iblock = 1
while (%t)
R1 = (iblock-1) * blocksize + 1;
R2 = blocksize + R1-1;
irange = [R1 C1 R2 C2];
mprintf("Block #%d, rows #%d to #%d\n",iblock,R1,R2);
tic();
M=csvRead(filename , [] , [] , [] , [] , [] , irange );
t = toc();
nrows = size(M,"r");
ncols = size(M,"c");
if ( nrows > 0 ) then
p = t/(nrows*ncols)*1.e6;
mprintf("  Actual #rows=%d\n",nrows);
mprintf("  T=%.3f (s)\n",t);
mprintf("  T=%.1f (ms/cell)\n",p);
end
if ( nrows < blocksize ) then
mprintf("... End of the file.\n");
break
end
iblock = iblock + 1;
end

This produces:

Block #1, rows #1 to #5000
Actual #rows=5000
T=3.135 (s)
T=209.0 (ms/cell)
Block #2, rows #5001 to #10000
Actual #rows=5000
T=3.139 (s)
T=209.3 (ms/cell)
Block #3, rows #10001 to #15000
Actual #rows=5000
T=3.151 (s)
T=210.1 (ms/cell)
etc....
Example with range
CSV = ["1,0,0,0,0"; ..
"0,1,0,0,0"; ..
"0,0,1,0,0"; ..
"4,4,1,2,0"; ..
"4,63,1,2,0"; ..
"4,63,1,4,233"; ..
"42,3,23,2,23"; ..
];
filename = fullfile(TMPDIR , 'foo.csv');
mputl(CSV, filename);
// Extract a subset of the csv file
csvRead(filename, [], [], "double", [], [], [5 3 7 6])
Example with header
comments = [
"// Copyright (C) INRIA"];
filename = fullfile(TMPDIR , 'foo.csv');
csvWrite(rand(2,3), filename, ascii(9), ",", [], comments);

header = 2;
[M, c] = csvRead(filename, ascii(9), ",", "double", [], [], [], header) // Ignore the two first lines (the header)

See also

History

ВерсияОписание
5.4.0 Function introduced. Based on the 'csv_readwrite' module. The only difference in the behavior compared to read_csv is that csvRead will try to convert value to double by default when read_csv will return value as string.
5.4.1 If decimal is different of [] and conversion is set to string, the decimal conversion will be done.
5.5 Addition of the "header" input argument, to ignore headers.
Scilab Enterprises
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Tue Feb 14 15:13:37 CET 2017