regexp

in a string, locate (and extract) substrings matching a regular expression

Syntax

[start, final, match, foundString] = regexp(input, pattern)
[start, final, match, foundString] = regexp(input, pattern, "once")

Arguments

input: a string.
pattern: a character string (under the rules of regular expression).
start: the starting index of each substring of input that matches the regular expression string pattern.
final: the ending index of each substring of input that matches the regular expression string pattern.
match: the text of each substring of input that matches pattern.
foundString: the captured parenthesized subpatterns.
"once | "o" flag: 'o' for matching the pattern only once.

Description

Regular expressions, often abbreviated as "regex" or "regexp" are powerful tools used in programming and text processing for pattern matching within strings. They provide a concise and flexible means for identifying and manipulating strings of text, such as particular characters, words, or patterns of characters.

They are essentially a sequence of characters that form a search pattern. This pattern can be used to search, edit, or manipulate text. Others features can be encoded :

Metacharacters: These are special characters that have a unique meaning within a regex : . (dot): Matches any single character except a newline. * (asterisk): Matches zero or more occurrences of the preceding element. + (plus): Matches one or more occurrences of the preceding element. ? (question mark): Matches zero or one occurrence of the preceding element. | (pipe): Acts as a logical OR operator. ^ (caret): Matches the beginning of a line. $ (dollar sign): Matches the end of a line.
Character Classes: These allow you to match any one of a set of characters. For example, [abc] will match any one of the characters a, b, or c.
Quantifiers: These specify how many instances of a character, group, or character class must be present in the input for a match to be found. Examples include {n}, {n,}, and {n,m}.
Groups and Capturing: Parentheses () are used to create groups within a regex. These groups can be used to capture the text matched by the group for further processing.
Escaping: If you need to match a character that is a metacharacter, you can escape it with a backslash \. For example, \. will match a literal dot.
Anchors: These are used to specify the position in the text where a match must occur. Common anchors include ^ for the start of a line and $ for the end of a line.
Modifiers: These are options that change how the regex engine interprets the pattern. Common modifiers include case-insensitive matching and global matching.

For the full syntax specification, see the regular expressions supported by PCRE2.

Examples

regexp('xabyabbbz','/ab*/','o')
regexp('a!','/((((((((((a))))))))))\041/')
regexp('ABCC','/^abc$/i')
regexp('ABC','/ab|cd/i')
[a b c]=regexp('XABYABBBZ','/ab*/i')

piString="3.14"
[a,b,c,piStringSplit]=regexp(piString,"/(\d+)\.(\d+)/")
disp(piStringSplit(1))
disp(piStringSplit(2))

[a,b,c,d]=regexp('xabyabbbz','/ab(.*)b(.*)/')
size(d)

// get host name from URL
myURL="https://www.scilab.org/download/";
[a,b,c,d]=regexp(myURL,'@^(?:http://)?([^/]+)@i')

str='foobar: 2012';
// Using named subpatterns
[a,b,c,d]=regexp(str,'/(?P<name>\w+): (?P<digit>\d+)/')
d(1)=="foobar"
d(2)=="2012"

History

Version	Description
2026.0.0	PCRE2 was used as engine.
5.4.0	A new output argument, foundString, has been added to retrieve subpatterns matches.

Report an issue
<< prettyprint	Chaînes de caractères	sci2exp >>

Copyright (c) 2022-2025 (Dassault Systèmes S.E.)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors

Last updated:
Thu Oct 16 09:08:45 CEST 2025