Scilab Website | Contribute with GitLab | Mailing list archives | ATOMS toolboxes
Scilab Online Help
2023.0.0 - English


strsplit

split a single string at some given positions or patterns

Syntax

chunks = strsplit(string)
chunks = strsplit(string, indices)

[chunks, matched_separators] = strsplit(string, separators)
[chunks, matched_separators] = strsplit(string, separators, limit)
[chunks, matched_separators] = strsplit(string, regexp)
[chunks, matched_separators] = strsplit(string, regexp, limit)

Arguments

string
a single character string to split. UTF8 extended characters supported.

indices
vector of increasing indices, in the interval [1, length(string)-1].

separators
matrix of strings searched in the string and used as scissors. UTF8 extended characters are supported.

regexp
single string starting and ending with "/" and specifying a case-sensitive regular expression pattern used as splitting separator. No regexp option can be used after the trailing "/" delimiter. The regular expression may include UTF8 extended characters. The "/" and "\" characters used in the body of the regexp must be protected as "\/" and "\\". Example: "/k.{2}o/"

chunks
column of strings, with length(indices)+1 elements = split chunks.

matched_separators
column of strings, of size size(chunks,1)-1 : matched separators or expression patterns.

limit
integer > 0: Maximum number of times that separators are searched and used along the string. If this one includes more separators occurrences, its unsplit tail is returned as last chunk in chunks($).

Description

strsplit(string) splits string into all its individual characters.

strsplit(string, indices) splits string at the characters positions given in the indices vector. Characters at these indices are heads of returned chunks.

strsplit(string, separators) splits string at positions after any matching separator among separators strings. Detected and used separators are removed from chunks tails. strsplit(string, "") is equivalent to strsplit(string).

strsplit(string, regexp) does the same, except that string is parsed for the given regular expression used as "generic separator", instead of for any "constant" separator among a limited separators set.

If string starts with a matching separator or expression, chunks(1) is set to "".

If string ends with a matching separator or expression, "" is appended as last chunks element.

If no matching separator or regexp is found in string, this one is returned as is in chunks. That will be noticeably the case for string="".

Without the limit option, any string including n separators will be split into n+1 chunks.

strsplit(string, separators, limit) or strsplit(string, regexp, limit) will search for a matching separator or expression for a maximum of limit times. If then there are remaining matches in the unprocessed tail of string, this tail is returned as is in chunks($).

[chunks, matched_separators] = strsplit(string,…) returns the column of the matched separators or expressions, in addition to chunks. Then strcat([chunks' ; [matched_separators' ""]]) should be equal to string.

Comparison between strsplit() and tokens():
strsplit() tokens()
can work with indices works only with separators
works with regexp does not accept regexp
works with any separator is restricted to 1-character separators
keeps all empty chunks removes them
can limit the number of split always splits all
slower faster

Examples

Split at given indices:

strsplit("Scilab")'
strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])
--> strsplit("Scilab")'
 ans  =
  "S"  "c"  "i"  "l"  "a"  "b"

--> strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])
 ans  =
  "α"
  "βδεϵζ"
  "ηθικλ"
  "μνξοπρστυφϕχψω"

Split at matching separators:

strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa")   // t starts with the separator => heading "" chunk

// Consecutive separators are not squeezed:
strsplit("abbcccdde", "c")'

// With several possible separators:
t = "aabcabbcbaaacacaabbcbccaaabcbc";
[c, s] = strsplit(t, ["aa" "bb"]);
c', s'
strcat([c';[s' ""]]) == t

// Let's limit the number of split to 4, => 4 chunks + unprocessed tail:
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)

// Splitting a string ending with a separator yields a final "":
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa") // t starts with the separator => heading "" chunk
 ans  =
  ""
  "bcabbcb"
  "acac"
  "bbcbcc"
  "abcbc"

--> // Consecutive separators are not squeezed:
--> strsplit("abbcccdde", "c")'
 ans  =
  "abb"  ""  ""  "dde"


--> // With several possible separators:
--> t = "aabcabbcbaaacacaabbcbccaaabcbc";
--> [c, s] = strsplit(t, ["aa" "bb"]);
--> c', s'
 ans  =
  ""  "bca"  "cb"  "acac"  ""  "cbcc"  "abcbc"
 ans  =
  "aa"  "bb"  "aa"  "aa"  "bb"  "aa"

--> strcat([c';[s' ""]]) == t
 ans  =
  T

--> // Let's limit the number of split to 4, => 4 chunks + unprocessed tail:
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)'
 ans  =
  ""  "bca"  "cb"  "acac"  "bbcbccaaabcbc"


--> // Splitting a string ending with a separator yields a final "":
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'
 ans  =
  "aabcabbcbaaacacaabb"  "caaab"  ""

Use a regular expression as scissors:

[c, s] = strsplit("C:\Windows\System32\OpenSSH\",  "/\\|:/");
c', s'
[c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2);
c', s'
--> [c, s] = strsplit("C:\Windows\System32\OpenSSH\",  "/\\|:/");
--> c', s'
 ans  =
  "C"  ""  "Windows"  "System32"  "OpenSSH"  ""
 ans  =
  ":"  "\"  "\"  "\"  "\"


--> [c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2);
--> c', s'
 ans  =
  "abcdef"  "ghijkl"  "mnopqr6stuvw7xyz"
 ans  =
  "8"  "3"

See also

  • tokens — Splits a string using separators and gives its chunks
  • strindex — search position of a character string in another string
  • part — Extraction of characters from strings
  • regexp — find a substring that matches the regular expression string
  • strcat — concatenates character strings
Report an issue
<< strrev Strings strspn >>

Copyright (c) 2022-2024 (Dassault Systèmes)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Mon Mar 27 11:52:45 GMT 2023