tokens

セパレータを使用してテキストをチャンクに分割する

呼び出し手順

Chunks = tokens(text)
Chunks = tokens(text, separators)

引数

text: 分割する単一のテキスト. 拡張UTF-8国際文字を含めることができます.
separators: 文字または文字のベクトル. トークンデリミタ. Default value = [" ", ascii(9)], ascii(9) being the horizontal tab.
Chunks: みつかったトークンの列ベクトル

説明

tokens(…) は, 文字列textの中に含まれるトークンを探します. tokens(…)は,テキストでセパレータを検索し,それをチャンクに分割します. チャンクにはセパレータがありません. 連続する区切り文字がマージされます.

例

tokens("The given   text")

tokens("SCI/demos/scicos", "/")'

tokens("Επιστήμη και καινοτομία", ["α"," "])'

nbsp = ascii(160); // non-breakable space
t = "the" + nbsp + "given   text"
tokens(t)

--> tokens('The given   text')
 ans  =
  "The"
  "given"
  "text"


--> tokens('SCI/demos/scicos', '/')'
 ans  =
  "SCI"  "demos"  "scicos"


--> tokens("Επιστήμη και καινοτομία", ["α"," "])'
 ans  =
  "Επιστήμη"  "κ"  "ι"  "κ"  "ινοτομί"


--> nbsp = ascii(160); // non-breakable space
--> t = "the" + nbsp + "given   text"
 t  =
  "the given   text"

--> tokens(t)
 ans  =
  "the given"
  "text"

参照

strsplit — split a single string at some given positions or patterns
regexp — 文字列内で、正規表現に一致する部分文字列を検索 (および抽出) します
strindex — 他の文字列の中で指定した文字列の位置を探す.
tokenpos — 文字列の中のトークンの位置を返す

Report an issue
<< tokenpos	Strings	Sound file handling >>

Copyright (c) 2022-2026 (Dassault Systèmes S.E.)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors

Last updated:
Mon Jun 17 17:54:19 CEST 2024