strsplit

split a single string at some given positions or patterns

Syntax

chunks = strsplit(string)
chunks = strsplit(string, indices)

[chunks, matched_separators] = strsplit(string, separators)
[chunks, matched_separators] = strsplit(string, separators, limit)
[chunks, matched_separators] = strsplit(string, regexp)
[chunks, matched_separators] = strsplit(string, regexp, limit)

Arguments

string: a single character string to split. UTF8 extended characters supported.
indices: vector of increasing indices, in the interval [1, length(string)-1].
separators: matrix of strings searched in the string and used as scissors. UTF8 extended characters are supported.
regexp: single string starting and ending with "/" and specifying a case-sensitive regular expression pattern used as splitting separator. No regexp option can be used after the trailing "/" delimiter. The regular expression may include UTF8 extended characters. The "/" and "\" characters used in the body of the regexp must be protected as "\/" and "\\". Example: "/k.{2}o/"
chunks: column of strings, with length(indices)+1 elements = split chunks.
matched_separators: column of strings, of size size(chunks,1)-1 : matched separators or expression patterns.
limit: integer > 0: Maximum number of times that separators are searched and used along the string. If this one includes more separators occurrences, its unsplit tail is returned as last chunk in chunks($).

Description

strsplit(string) splits string into all its individual characters.

strsplit(string, indices) splits string at the characters positions given in the indices vector. Characters at these indices are heads of returned chunks.

strsplit(string, separators) splits string at positions after any matching separator among separators strings. Detected and used separators are removed from chunks tails. strsplit(string, "") is equivalent to strsplit(string).

strsplit(string, regexp) does the same, except that string is parsed for the given regular expression used as "generic separator", instead of for any "constant" separator among a limited separators set.

If string starts with a matching separator or expression, chunks(1) is set to "".

If string ends with a matching separator or expression, "" is appended as last chunks element.

If no matching separator or regexp is found in string, this one is returned as is in chunks. That will be noticeably the case for string="".

Without the limit option, any string including n separators will be split into n+1 chunks.

strsplit(string, separators, limit) or strsplit(string, regexp, limit) will search for a matching separator or expression for a maximum of limit times. If then there are remaining matches in the unprocessed tail of string, this tail is returned as is in chunks($).

[chunks, matched_separators] = strsplit(string,…) returns the column of the matched separators or expressions, in addition to chunks. Then strcat([chunks' ; [matched_separators' ""]]) should be equal to string.

Comparison between strsplit() and tokens():

strsplit()	tokens()
can work with indices	works only with separators
works with regexp	does not accept regexp
works with any separator	is restricted to 1-character separators
keeps all empty chunks	removes them
can limit the number of split	always splits all
slower	faster

Examples

Split at given indices:

strsplit("Scilab")'
strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])

--> strsplit("Scilab")'
 ans  =
  "S"  "c"  "i"  "l"  "a"  "b"

--> strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])
 ans  =
  "α"
  "βδεϵζ"
  "ηθικλ"
  "μνξοπρστυφϕχψω"

Split at matching separators:

strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa")   // t starts with the separator => heading "" chunk

// Consecutive separators are not squeezed:
strsplit("abbcccdde", "c")'

// With several possible separators:
t = "aabcabbcbaaacacaabbcbccaaabcbc";
[c, s] = strsplit(t, ["aa" "bb"]);
c', s'
strcat([c';[s' ""]]) == t

// Let's limit the number of split to 4, => 4 chunks + unprocessed tail:
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)

// Splitting a string ending with a separator yields a final "":
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'

--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa") // t starts with the separator => heading "" chunk
 ans  =
  ""
  "bcabbcb"
  "acac"
  "bbcbcc"
  "abcbc"

--> // Consecutive separators are not squeezed:
--> strsplit("abbcccdde", "c")'
 ans  =
  "abb"  ""  ""  "dde"


--> // With several possible separators:
--> t = "aabcabbcbaaacacaabbcbccaaabcbc";
--> [c, s] = strsplit(t, ["aa" "bb"]);
--> c', s'
 ans  =
  ""  "bca"  "cb"  "acac"  ""  "cbcc"  "abcbc"
 ans  =
  "aa"  "bb"  "aa"  "aa"  "bb"  "aa"

--> strcat([c';[s' ""]]) == t
 ans  =
  T

--> // Let's limit the number of split to 4, => 4 chunks + unprocessed tail:
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)'
 ans  =
  ""  "bca"  "cb"  "acac"  "bbcbccaaabcbc"


--> // Splitting a string ending with a separator yields a final "":
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'
 ans  =
  "aabcabbcbaaacacaabb"  "caaab"  ""

Use a regular expression as scissors:

[c, s] = strsplit("C:\Windows\System32\OpenSSH\",  "/\\|:/");
c', s'
[c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2);
c', s'

--> [c, s] = strsplit("C:\Windows\System32\OpenSSH\",  "/\\|:/");
--> c', s'
 ans  =
  "C"  ""  "Windows"  "System32"  "OpenSSH"  ""
 ans  =
  ":"  "\"  "\"  "\"  "\"


--> [c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2);
--> c', s'
 ans  =
  "abcdef"  "ghijkl"  "mnopqr6stuvw7xyz"
 ans  =
  "8"  "3"

Report an issue
<< strrev	Cadeias de Caracteres (Strings)	strspn >>

strsplit

Syntax

Arguments

Description

Examples

See also