Scilab Website | Contribute with GitLab | Mailing list archives | ATOMS toolboxes
Scilab Online Help
2024.1.0 - 日本語


strsplit

split a single string at some given positions or patterns

Syntax

chunks = strsplit(string)
chunks = strsplit(string, indices)

[chunks, matched_separators] = strsplit(string, separators)
[chunks, matched_separators] = strsplit(string, separators, limit)
[chunks, matched_separators] = strsplit(string, regexp)
[chunks, matched_separators] = strsplit(string, regexp, limit)

Arguments

string
a single character string to split. UTF8 extended characters supported.

indices
vector of increasing indices, in the interval [1, length(string)-1].

separators
matrix of strings searched in the string and used as scissors. UTF8 extended characters are supported.

regexp
single string starting and ending with "/" and specifying a case-sensitive regular expression pattern used as splitting separator. No regexp option can be used after the trailing "/" delimiter. The regular expression may include UTF8 extended characters. The "/" and "\" characters used in the body of the regexp must be protected as "\/" and "\\". Example: "/k.{2}o/"

chunks
column of strings, with length(indices)+1 elements = split chunks.

matched_separators
column of strings, of size size(chunks,1)-1 : matched separators or expression patterns.

limit
integer > 0: Maximum number of times that separators are searched and used along the string. If this one includes more separators occurrences, its unsplit tail is returned as last chunk in chunks($).

Description

strsplit(string) splits string into all its individual characters.

strsplit(string, indices) splits string at the characters positions given in the indices vector. Characters at these indices are heads of returned chunks.

strsplit(string, separators) splits string at positions after any matching separator among separators strings. Detected and used separators are removed from chunks tails. strsplit(string, "") is equivalent to strsplit(string).

strsplit(string, regexp) does the same, except that string is parsed for the given regular expression used as "generic separator", instead of for any "constant" separator among a limited separators set.

If string starts with a matching separator or expression, chunks(1) is set to "".

If string ends with a matching separator or expression, "" is appended as last chunks element.

If no matching separator or regexp is found in string, this one is returned as is in chunks. That will be noticeably the case for string="".

Without the limit option, any string including n separators will be split into n+1 chunks.

strsplit(string, separators, limit) or strsplit(string, regexp, limit) will search for a matching separator or expression for a maximum of limit times. If then there are remaining matches in the unprocessed tail of string, this tail is returned as is in chunks($).

[chunks, matched_separators] = strsplit(string,…) returns the column of the matched separators or expressions, in addition to chunks. Then strcat([chunks' ; [matched_separators' ""]]) should be equal to string.

Comparison between strsplit() and tokens():
strsplit() tokens()
can work with indices works only with separators
works with regexp does not accept regexp
works with any separator is restricted to 1-character separators
keeps all empty chunks removes them
can limit the number of split always splits all
slower faster

Examples

Split at given indices:

strsplit("Scilab")'
strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])
--> strsplit("Scilab")'
 ans  =
  "S"  "c"  "i"  "l"  "a"  "b"

--> strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])
 ans  =
  "α"
  "βδεϵζ"
  "ηθικλ"
  "μνξοπρστυφϕχψω"

Split at matching separators:

strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa")   // t starts with the separator => heading "" chunk

// Consecutive separators are not squeezed:
strsplit("abbcccdde", "c")'

// With several possible separators:
t = "aabcabbcbaaacacaabbcbccaaabcbc";
[c, s] = strsplit(t, ["aa" "bb"]);
c', s'
strcat([c';[s' ""]]) == t

// Let's limit the number of split to 4, => 4 chunks + unprocessed tail:
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)

// Splitting a string ending with a separator yields a final "":
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa") // t starts with the separator => heading "" chunk
 ans  =
  ""
  "bcabbcb"
  "acac"
  "bbcbcc"
  "abcbc"

--> // Consecutive separators are not squeezed:
--> strsplit("abbcccdde", "c")'
 ans  =
  "abb"  ""  ""  "dde"


--> // With several possible separators:
--> t = "aabcabbcbaaacacaabbcbccaaabcbc";
--> [c, s] = strsplit(t, ["aa" "bb"]);
--> c', s'
 ans  =
  ""  "bca"  "cb"  "acac"  ""  "cbcc"  "abcbc"
 ans  =
  "aa"  "bb"  "aa"  "aa"  "bb"  "aa"

--> strcat([c';[s' ""]]) == t
 ans  =
  T

--> // Let's limit the number of split to 4, => 4 chunks + unprocessed tail:
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)'
 ans  =
  ""  "bca"  "cb"  "acac"  "bbcbccaaabcbc"


--> // Splitting a string ending with a separator yields a final "":
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'
 ans  =
  "aabcabbcbaaacacaabb"  "caaab"  ""

Use a regular expression as scissors:

[c, s] = strsplit("C:\Windows\System32\OpenSSH\",  "/\\|:/");
c', s'
[c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2);
c', s'
--> [c, s] = strsplit("C:\Windows\System32\OpenSSH\",  "/\\|:/");
--> c', s'
 ans  =
  "C"  ""  "Windows"  "System32"  "OpenSSH"  ""
 ans  =
  ":"  "\"  "\"  "\"  "\"


--> [c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2);
--> c', s'
 ans  =
  "abcdef"  "ghijkl"  "mnopqr6stuvw7xyz"
 ans  =
  "8"  "3"

See also

  • tokens — セパレータを使用してテキストをチャンクに分割する
  • strindex — 他の文字列の中で指定した文字列の位置を探す.
  • part — 文字列の展開
  • regexp — 文字列内で、正規表現に一致する部分文字列を検索 (および抽出) します
  • strcat — 文字列を結合する
Report an issue
<< strrev Strings strspn >>

Copyright (c) 2022-2024 (Dassault Systèmes)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Mon Jun 17 17:54:19 CEST 2024