strsplit
split a single string at some given positions or patterns
Syntax
chunks = strsplit(string) chunks = strsplit(string, indices) [chunks, matched_separators] = strsplit(string, separators) [chunks, matched_separators] = strsplit(string, separators, limit) [chunks, matched_separators] = strsplit(string, regexp) [chunks, matched_separators] = strsplit(string, regexp, limit)
Arguments
- string
- a single character string to split. UTF8 extended characters supported.
- indices
- vector of increasing indices, in the interval
[1, length(string)-1]
. - separators
- matrix of strings searched in the
string
and used as scissors. UTF8 extended characters are supported. - regexp
- single string starting and ending with "/" and specifying a case-sensitive
regular expression pattern used as splitting separator. No regexp option
can be used after the trailing "/" delimiter. The regular expression
may include UTF8 extended characters. The "/" and "\" characters used
in the body of the regexp must be protected as "\/" and "\\".
Example:
"/k.{2}o/"
- chunks
- column of strings, with
length(indices)+1
elements = split chunks. - matched_separators
- column of strings, of size
size(chunks,1)-1
: matched separators or expression patterns. - limit
- integer > 0: Maximum number of times that separators are searched and used along
the
string
. If this one includes more separators occurrences, its unsplit tail is returned as last chunk inchunks($)
.
Description
strsplit(string) splits string
into all its individual characters.
strsplit(string, indices) splits string
at the characters positions given in the indices
vector.
Characters at these indices are heads of returned chunks
.
strsplit(string, separators) splits string
at positions after any matching separator among
separators
strings.
Detected and used separators are removed from chunks tails.
strsplit(string, "")
is equivalent to
strsplit(string)
.
strsplit(string, regexp) does the same,
except that string
is parsed for the given regular expression
used as "generic separator", instead of for any "constant" separator among
a limited separators
set.
If string
starts with a matching separator or expression,
chunks(1)
is set to ""
.
If string
ends with a matching separator or expression,
""
is appended as last chunks
element.
If no matching separator or regexp is found in string
,
this one is returned as is in chunks
.
That will be noticeably the case for string=""
.
Without the limit
option, any string
including n
separators will be split into
n+1
chunks.
strsplit(string, separators, limit) or
strsplit(string, regexp, limit) will
search for a matching separator or expression for a maximum of
limit
times. If then there are remaining matches in
the unprocessed tail of string
, this tail is returned
as is in chunks($)
.
[chunks, matched_separators] = strsplit(string,…)
returns the column of the matched separators or expressions, in addition to
chunks
.
Then strcat([chunks' ; [matched_separators' ""]])
should be
equal to string
.
Comparison between strsplit() and tokens():
|
Examples
Split at given indices:
strsplit("Scilab")' strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11])
--> strsplit("Scilab")' ans = "S" "c" "i" "l" "a" "b" --> strsplit("αβδεϵζηθικλμνξοπρστυφϕχψω", [1 6 11]) ans = "α" "βδεϵζ" "ηθικλ" "μνξοπρστυφϕχψω"
Split at matching separators:
strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa") // t starts with the separator => heading "" chunk // Consecutive separators are not squeezed: strsplit("abbcccdde", "c")' // With several possible separators: t = "aabcabbcbaaacacaabbcbccaaabcbc"; [c, s] = strsplit(t, ["aa" "bb"]); c', s' strcat([c';[s' ""]]) == t // Let's limit the number of split to 4, => 4 chunks + unprocessed tail: strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4) // Splitting a string ending with a separator yields a final "": strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")'
--> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "aa") // t starts with the separator => heading "" chunk ans = "" "bcabbcb" "acac" "bbcbcc" "abcbc" --> // Consecutive separators are not squeezed: --> strsplit("abbcccdde", "c")' ans = "abb" "" "" "dde" --> // With several possible separators: --> t = "aabcabbcbaaacacaabbcbccaaabcbc"; --> [c, s] = strsplit(t, ["aa" "bb"]); --> c', s' ans = "" "bca" "cb" "acac" "" "cbcc" "abcbc" ans = "aa" "bb" "aa" "aa" "bb" "aa" --> strcat([c';[s' ""]]) == t ans = T --> // Let's limit the number of split to 4, => 4 chunks + unprocessed tail: --> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", ["aa" "bb"], 4)' ans = "" "bca" "cb" "acac" "bbcbccaaabcbc" --> // Splitting a string ending with a separator yields a final "": --> strsplit("aabcabbcbaaacacaabbcbccaaabcbc", "cbc")' ans = "aabcabbcbaaacacaabb" "caaab" ""
Use a regular expression as scissors:
[c, s] = strsplit("C:\Windows\System32\OpenSSH\", "/\\|:/"); c', s' [c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2); c', s'
--> [c, s] = strsplit("C:\Windows\System32\OpenSSH\", "/\\|:/"); --> c', s' ans = "C" "" "Windows" "System32" "OpenSSH" "" ans = ":" "\" "\" "\" "\" --> [c, s] = strsplit("abcdef8ghijkl3mnopqr6stuvw7xyz", "/\d+/", 2); --> c', s' ans = "abcdef" "ghijkl" "mnopqr6stuvw7xyz" ans = "8" "3"
See also
Report an issue | ||
<< strrev | Cadeias de Caracteres (Strings) | strspn >> |