csvRead
Read comma-separated value file
Syntax
M = csvRead(filename) M = csvRead(filename, separator) M = csvRead(filename, separator, decimal) M = csvRead(filename, separator, decimal, conversion) M = csvRead(filename, separator, decimal, conversion, substitute) M = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range) M = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range, header) [M, comments] = csvRead(filename, separator, decimal, conversion, substitute, regexpcomments, range, header)
Arguments
- filename
a single string: the file path.
- separator
a single string: the field separator used.
- decimal
a character: the decimal symbol used in numbers (dot or comma).
If
decimal
is different of[]
andconversion
is set tostring
, the decimal conversion will be done.- conversion
"string" or "double" (by default): Specifies the type of the output
M
.Note that read_csv has "string" as default.
- substitute
a m-by-2 matrix of strings, a replacing map (default = [], meaning no replacements). The first column
substitute(:,1)
contains the searched strings and the second columnsubstitute(:,2)
contains the replace strings. Every occurrence of a searched string in the file is replaced.- regexpcomments
a string specifying a regular expression (default: []). Lines of the file matching the expression are considered as comments.
The same character is expected as delimiter at the beginning and at the end of the string. If it is needed in the expression's body, it must be escaped with "\" (Here is a good online page explaining the syntax of regular expressions).
- range
[firstRow firstColumn lastRow lastColumn]
row vector of floating point integers, with lastRow ≤ 2e9 and lastColumn ≤ 2e9: the range of rows and columns selecting the block of data to be returned. Default [], meaning all data. The block is selected on actual data, after removing the header and/or commented rows (when requested).To select all rows or/and columns starting from [firstRow, firstCol], while the total numbers of rows or columns are unknown, lastRow=2e9 or/and lastColumn=2e9 can be specified (the actual limit is 231).- header
a floating point integer: the number of lines to be skipped at the beginning of the file, before reading actual data.
- M
a m-by-n matrix of strings or numbers. Complex numbers are supported.
- comments
a column vector of strings: Lines of text matching
regexpcomments
.
Description
Given an ascii file with comma separated values delimited fields, this function returns the corresponding Scilab matrix of strings or doubles.
For example, the .csv data file may have been created by a spreadsheet software using "Text and comma" format.
It might happen that the columns are separated by a non-comma separator. In this case, use csvRead(filename, separator) for another choice of separator.
The default value of the optional input arguments are defined by the
csvDefault
function.
Any optional input argument equal to the empty matrix
[]
is set to its default value.
When the input argument "conversion" is equal to "double", the non-numeric fields within the .csv (e.g. strings) are converted into NaN.
csvRead is able to handle both UTF-8 and ASCII text files.
Examples
The following script presents some basic uses of the
csvRead
function.
// Create a file with some data separated with tabs. M = 1:50; filename = fullfile(TMPDIR, "data.csv"); csvWrite(M, filename, ascii(9), '.'); // read csv file M1 = csvRead(filename,ascii(9), [], 'string') // Returns a double M2 = csvRead(filename,ascii(9), '.', 'double') // Compares original data and result. and(M == M2) // Use the substitude argument to manage // special data files. content = [ "1" "Not-A-Number" "2" "Not-A-Number" ]; substitute = [ "Not-A-Number" "Nan" ]; mputl(content,filename); M = csvRead(filename,",",".","double",substitute) isnan(M(2,1)) // Expected=%t isnan(M(4,1)) // Expected=%t
The following script presents more practical uses of the
csvRead
function.
// Define a matrix of strings Astr = [ "1" "8" "15" "22" "29" "36" "43" "50" "2" "9" "16" "23" "30" "37" "44" "51" "3" "10" "17" "6+3*I" "31" "38" "45" "52" "4" "11" "18" "25" "32" "39" "46" "53" "5" "12" "19" "26" "33" "40" "47" "54" "6" "13" "20" "27" "34" "41" "48" "55" "+0" "-0" "Inf" "-Inf" "Nan" "1.D+308" "1.e-308" "1.e-323" ]; // Create a file with some data separated with commas filename = fullfile(TMPDIR , 'foo.csv'); sep = ","; fd = mopen(filename,'wt'); for i = 1 : size(Astr,"r") mfprintf(fd,"%s\n",strcat(Astr(i,:),sep)); end mclose(fd); // To see the file: edit(filename) // Read this file Bstr = csvRead ( filename ) // Create a file with a particular separator: here ";" filename = fullfile(TMPDIR , 'foo.csv'); sep = ";"; fd = mopen(filename,'wt'); for i = 1 : size(Astr,"r") mfprintf(fd,"%s\n",strcat(Astr(i,:),sep)); end mclose(fd); // // Read the file and customize the separator csvRead ( filename , sep )
The following script shows how to remove lines with regexp argument
of the csvRead
function.
CSV = ["// tata"; .. "1,0,0,0"; .. "// titi"; .. "0,1,0,0"; .. " // toto"; .. "0,0,1,0"; .. " tutu // tata"]; filename = fullfile(TMPDIR , 'foo.csv'); mputl(CSV, filename); // Ignore all lines including "//" and return them as comments: [M, comments] = csvRead(filename, [], [], [], [], '!//!')
--> [M, comments] = csvRead(filename, [], [], [], [], '!//!') M = 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. comments = "// tata" "// titi" " // toto" " tutu // tata"
Empty field are managed by csvRead
csvWrite(['1','','3';'','','6'], TMPDIR + "/example.csv") csvRead(TMPDIR + "/example.csv", [], [], "string") csvRead(TMPDIR + "/example.csv", [], [], "double")
// Define a matrix of strings Astr = [ "1" "8" "15" "22" "29" "36" "43" "50" "2" "9" "16" "23" "30" "37" "44" "51" "3" "10" "17" "6+3*I" "31" "38" "45" "52" "4" "11" "18" "25" "32" "39" "46" "53" "5" "12" "19" "26" "33" "40" "47" "54" "6" "13" "20" "27" "34" "41" "48" "55" "+0" "-0" "Inf" "-Inf" "Nan" "1.D+308" "1.e-308" "1.e-323" ]; // Create a file with some data separated with commas filename = fullfile(TMPDIR , 'foo.csv'); sep = ","; fd = mopen(filename,'wt'); for i = 1 : size(Astr,"r") mfprintf(fd,"%s\n",strcat(Astr(i,:),sep)); end mclose(fd); // To see the file: edit(filename) // Read this file Bstr = csvRead ( filename ) // Create a file with a particular separator: here ";" filename = fullfile(TMPDIR , 'foo.csv'); sep = ";"; fd = mopen(filename,'wt'); for i = 1 : size(Astr,"r") mfprintf(fd,"%s\n",strcat(Astr(i,:),sep)); end mclose(fd); // // Read the file and customize the separator csvRead ( filename , sep )
In the following script, the file "filename" is read by blocks of 5000 rows. The algorithm stops when the number of rows actually read from the file differ from 5000, i.e. when the end of the file has been reached.
blocksize = 5000; C1 = 1; C2 = 3; iblock = 1 while %t R1 = (iblock-1) * blocksize + 1; R2 = blocksize + R1-1; irange = [R1 C1 R2 C2]; mprintf("\nBlock #%d, rows #%d to #%d\n",iblock,R1,R2); tic(); M=csvRead(filename , [] , [] , [] , [] , [] , irange ); t = toc(); nrows = size(M,"r"); ncols = size(M,"c"); if ( nrows > 0 ) then p = t/(nrows*ncols)*1.e6; mprintf(" Actual #rows=%d\n",nrows); mprintf(" T=%.3f (s)\n",t); mprintf(" T=%.1f (ms/cell)\n",p); end if ( nrows < blocksize ) then mprintf("... End of the file.\n"); break end iblock = iblock + 1; end
This produces:
Block #1, rows #1 to #5000 Actual #rows=5000 T = 3.135 (s) T = 209.0 (ms/cell) Block #2, rows #5001 to #10000 Actual #rows=5000 T = 3.139 (s) T = 209.3 (ms/cell) Block #3, rows #10001 to #15000 Actual #rows=5000 T = 3.151 (s) T = 210.1 (ms/cell) etc....
Example with range
CSV = ["1,0,0,0,0"; .. "0,1,0,0,0"; .. "0,0,1,0,0"; .. "4,4,1,2,0"; .. "4,63,1,2,0"; .. "4,63,1,4,233"; .. "42,3,23,2,23"; .. ]; filename = fullfile(TMPDIR , 'foo.csv'); mputl(CSV, filename); // Extract a subset of the csv file csvRead(filename, [], [], "double", [], [], [5 3 7 6])
Example with header
See also
- csvTextScan — Converts into a matrix texts representing separated values
- csvWrite — Write comma-separated value file
- csvDefault — Get or set defaults behavior for csv files.
- mgetl — reads lines from an text file
- read — Leitura de matrizes
- fscanfMat — reads a matrix from a text file
- Regular expressions: Syntax
History
Version | Description |
5.4.0 | Function introduced. Based on the 'csv_readwrite' module. The only difference in the behavior compared to read_csv is that csvRead will try to convert value to double by default when read_csv will return value as string. |
5.4.1 | If decimal is different of []
and conversion is set to string ,
the decimal conversion will be done. |
5.5 | "header" input argument added. |
Report an issue | ||
<< csvDefault | Planilhas | csvTextScan >> |