tcl_regexpmatch(3)
NAME
Tcl_RegExpMatch,     Tcl_RegExpCompile,    Tcl_RegExpExec,
Tcl_RegExpRange, Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj,
Tcl_RegExpExecObj,  Tcl_RegExpGetInfo  -  Pattern matching
with regular expressions
SYNOPSIS
#include <tcl.h> int Tcl_RegExpMatchObj(interp, strObj, patObj) int Tcl_RegExpMatch(interp, string, pattern) Tcl_RegExp Tcl_RegExpCompile(interp, pattern) int Tcl_RegExpExec(interp, regexp, string, start) Tcl_RegExpRange(regexp, index, startPtr, endPtr) Tcl_RegExp Tcl_GetRegExpFromObj(interp, patObj, cflags) int Tcl_RegExpExecObj(interp, regexp, objPtr, offset, nmatches, eflags) Tcl_RegExpGetInfo(regexp, infoPtr)
ARGUMENTS
- Tcl_Interp *interp (in) Tcl interpreter to use
- for error reporting.  The
 interpreter may be NULL
 if no error reporting is
 desired.
- Tcl_Obj *strObj (in/out)
- Refers to the object from
 which to get the string
 to search. The internal
 representation of the
 object may be converted
 to a form that can be
 efficiently searched.
- Tcl_Obj *patObj (in/out)
- Refers to the object from
 which to get a regular
 expression. The compiled
 regular expression is
 cached in the object.
- char *string (in) String to check for a
- match   with   a  regular
 expression.
- CONST char *pattern (in) String in the form of a
- regular  expression  pat
 tern.
- Tcl_RegExp regexp (in) Compiled regular expres
- sion.    Must  have  been
 returned previously by
 Tcl_GetRegExpFromObj or Tcl_RegExpCompile.
- char *start (in) If string is just a por
- tion    of   some   other
 string, this argument
 identifies the beginning
 of the larger string. If
 it isn't the same as
 string, then no ^ matches will be allowed.
- int index (in) Specifies which range is
- desired:   0  means   the
 range of the entire
 match, 1 or greater means
 the range that matched a
 parenthesized sub-expres
 sion.
- CONST
 char **startPtr(out)
- The address of the  first
 character in the range is
 stored here, or NULL if
 there is no such range.
- CONST
 char **endPtr (out)
- The address of the  char
 acter just after the last
 one in the range is
 stored here, or NULL if
 there is no such range.
- int cflags (in)
- OR-ed combination of com
 pilation flags. See below
 for more information.
- Tcl_Obj *objPtr (in/out)
- An object which  contains
 the string to check for a
 match with a regular
 expression.
- int          off_
 set (in)
- The character offset into
 the string where matching
 should begin. The value
 of the offset has no
 impact on ^ matches.
 This behavior is con
 trolled by eflags.
- int nmatches (in)
- The  number  of  matching
 subexpressions that
 should be remembered for
 later use. If this value
 is 0, then no subexpres
 sion match information
 will be computed. If the
 value is -1, then all of
 the matching subexpres
 sions will be remembered.
 Any other value will be
 taken as the maximum num
 ber of subexpressions to
 remember.
- int eflags (in)
- OR-ed  combination of the
 values TCL_REG_NOTBOL and
 TCL_REG_NOTEOL. See
 below for more informa
 tion.
- Tcl_RegEx
 pInfo *infoPtr(out)
- The address of the  loca
 tion where information
 about a previous match
 should be stored by
 Tcl_RegExpGetInfo.
DESCRIPTION
Tcl_RegExpMatch determines whether  its  pattern  argument
matches  regexp,  where regexp is interpreted as a regular
expression using the  rules  in  the  re_syntax  reference
page.  If there is a match then Tcl_RegExpMatch returns 1.
If there is no match then Tcl_RegExpMatch returns  0.   If
an  error  occurs in the matching process (e.g. pattern is
not  a  valid  regular  expression)  then  Tcl_RegExpMatch
returns  -1 and leaves an error message in the interpreter
result.  Tcl_RegExpMatchObj is similar to  Tcl_RegExpMatch
except  it  operates  on the Tcl objects strObj and patObj
instead of UTF strings.  Tcl_RegExpMatchObj  is  generally
more  efficient  than  Tcl_RegExpMatch,  so it is the pre
ferred interface.
Tcl_RegExpCompile,  Tcl_RegExpExec,  and   Tcl_RegExpRange
provide  lower-level access to the regular expression pat
tern  matcher.   Tcl_RegExpCompile  compiles   a   regular
expression  string  into  the internal form used for effi
cient pattern matching.  The return value is a  token  for
this  compiled form, which can be used in subsequent calls
to Tcl_RegExpExec or Tcl_RegExpRange.  If an error  occurs
while compiling the regular expression then Tcl_RegExpCom
pile returns NULL and  leaves  an  error  message  in  the
interpreter result.  Note:  the return value from Tcl_Reg
ExpCompile is only valid up to the next call to Tcl_RegEx
pCompile;   it is not safe to retain these values for long
periods of time.
Tcl_RegExpExec executes  the  regular  expression  pattern
matcher.  It returns 1 if string contains a range of char
acters that match regexp, 0 if no match is found,  and  -1
if  an  error occurs.  In the case of an error, Tcl_RegEx
pExec leaves an error message in the  interpreter  result.
When searching a string for multiple matches of a pattern,
it is important to distinguish between the  start  of  the
original  string and the start of the current search.  For
example, when searching for the  second  occurrence  of  a
match,  the  string  argument might point to the character
just after the first match;  however, it is important  for
the  pattern matcher to know that this is not the start of
the entire string, so that it doesn't allow ^ atoms in the
pattern to match.  The start argument provides this infor
mation by pointing to the start of the overall string con
taining  string.   Start  will  be  less  than or equal to
string;  if it is less than string then no ^ matches  will
be allowed.
Tcl_RegExpRange   may   be  invoked  after  Tcl_RegExpExec
returns;  it  provides  detailed  information  about  what
ranges  of  the  string matched what parts of the pattern.
Tcl_RegExpRange returns a pair of  pointers  in  *startPtr
and  *endPtr  that  identify  a range of characters in the
source string for the most recent call to  Tcl_RegExpExec.
Index  indicates  which  of  several ranges is desired: if
index is 0, information  is  returned  about  the  overall
range of characters that matched the entire pattern;  oth
erwise, information is returned about the range of charac
ters that matched the index'th parenthesized subexpression
within the pattern.  If there is no range corresponding to
index then NULL is stored in *startPtr and *endPtr.
Tcl_GetRegExpFromObj,   Tcl_RegExpExecObj,   and  Tcl_Reg
ExpGetInfo are object interfaces  that  provide  the  most
direct  control  of  Henry  Spencer's  regular  expression
library.  For users that need to  modify  compilation  and
execution options directly, it is recommended that you use
these interfaces instead of calling  the  internal  regexp
functions.   These interfaces handle the details of UTF to
Unicode translations as well as providing improved perfor
mance through caching in the pattern and string objects.
- Tcl_GetRegExpFromObj attempts to return a compiled regular
expression from  the  patObj.   If  the  object  does  not
already  contain  a  compiled  regular  expression it will
 attempt to create one from the string in the object and
 assign it to the internal representation of the patObj. The return value of this function is of type Tcl_RegExp. The return value is a token for this compiled form, which
 can be used in subsequent calls to Tcl_RegExpExecObj or Tcl_RegExpGetInfo. If an error occurs while compiling the regular expression then Tcl_GetRegExpFromObj returns NULL and leaves an error message in the interpreter result.
 The regular expression token can be used as long as the
 internal representation of patObj refers to the compiled form. The eflags argument is a bitwise OR of zero or more of the following flags that control the compilation of
 patObj:
- TCL_REG_ADVANCED
    Compile advanced regular expressions (`AREs').
 This mode corresponds to the normal regular
 expression syntax accepted by the Tcl regexp and
 regsub commands.
- TCL_REG_EXTENDED
    Compile extended regular expressions (`EREs').
 This mode corresponds to the regular expression
 syntax recognized by Tcl 8.0 and earlier ver
 sions.
- TCL_REG_BASIC
    Compile basic regular expressions (`BREs'). This
 mode corresponds to the regular expression syntax
 recognized by common Unix utilities like sed and
 grep. This is the default if no flags are speci
 fied.
- TCL_REG_EXPANDED
    Compile the regular expression (basic, extended,
 or advanced) using an expanded syntax that allows
 comments and whitespace. This mode causes nonbackslashed non-bracket-expression white space
 and #-to-end-of-line comments to be ignored.
- TCL_REG_QUOTE
    Compile a literal string, with all characters
 treated as ordinary characters.
- TCL_REG_NOCASE
    Compile for matching that ignores upper/lower
 case distinctions.
- TCL_REG_NEW
 LINECompile for newline-sensitive matching. By
 default, newline is a completely ordinary charac
 ter with no special meaning in either regular
 expressions or strings. With this flag, `[^'
 bracket expressions and `.' never match newline,
 `^' matches an empty string after any newline in
 addition to its normal function, and `$' matches
 an empty string before any newline in addition to
 its normal function. REG_NEWLINE is the bitwise OR of REG_NLSTOP and REG_NLANCH.
- TCL_REG_NLSTOP
    Compile for partial newline-sensitive matching,
 with the behavior of `[^' bracket expressions and
 `.' affected, but not the behavior of `^' and
 `$'. In this mode, `[^' bracket expressions and
 `.' never match newline.
- TCL_REG_NLANCH
    Compile for inverse partial newline-sensitive
 matching, with the behavior of of `^' and `$'
 (the ``anchors'') affected, but not the behavior
 of `[^' bracket expressions and `.'. In this
 mode `^' matches an empty string after any new
 line in addition to its normal function, and `$'
 matches an empty string before any newline in
 addition to its normal function.
- TCL_REG_NOSUB
    Compile for matching that reports only success or
 failure, not what was matched. This reduces com
 pile overhead and may improve performance. Sub
 sequent calls to Tcl_RegExpGetInfo or Tcl_RegEx pRange will not report any match information.
- TCL_REG_CAN
 MATCHCompile for matching that reports the potential
 to complete a partial match given more text (see
 below).
- Only one of TCL_REG_EXTENDED, TCL_REG_ADVANCED, TCL_REG_BASIC, and TCL_REG_QUOTE may be specified.
- Tcl_RegExpExecObj executes the regular expression  pattern
matcher.  It returns 1 if objPtr contains a range of char
 acters that match regexp, 0 if no match is found, and -1 if an error occurs. In the case of an error, Tcl_RegEx pExecObj leaves an error message in the interpreter result. The nmatches value indicates to the matcher how many subexpressions are of interest. If nmatches is 0, then no subexpression match information is recorded, which
 may allow the matcher to make various optimizations. If
 the value is -1, then all of the subexpressions in the
 pattern are remembered. If the value is a positive inte
 ger, then only that number of subexpressions will be
 remembered. Matching begins at the specified Unicode
 character index given by offset. Unlike Tcl_RegExpExec, the behavior of anchors is not affected by the offset
 value. Instead the behavior of the anchors is explicitly
 controlled by the eflags argument, which is a bitwise OR of zero or more of the following flags:
 TCL_REG_NOT
 BOLThe starting character will not be treated as the
 beginning of a line or the beginning of the
 string, so `^' will not match there. Note that
 this flag has no effect on how 0
- TCL_REG_NOTEOL
    The last character in the string will not be
 treated as the end of a line or the end of the
 string, so '$' will not match there. Note that
 this flag has no effect on how Z' matches.
- Tcl_RegExpGetInfo  retrieves  information  about  the last
match performed with a given  regular  expression  regexp.
The  infoPtr  argument  contains  a pointer to a structure
that is defined as follows:
    
 typedef struct Tcl_RegExpInfo {int nsubs;
 Tcl_RegExpIndices *matches;
 long extendStart;} Tcl_RegExpInfo;
- The nsubs field contains a count of the number  of  paren
 thesized subexpressions within the regular expression. If
 the TCL_REG_NOSUB was used, then this value will be zero. The matches field points to an array of nsubs values that indicate the bounds of each subexpression matched. The
 first element in the array refers to the range matched by
 the entire regular expression, and subsequent elements
 refer to the parenthesized subexpressions in the order
 that they appear in the pattern. Each element is a struc
 ture that is defined as follows:
 typedef struct Tcl_RegExpIndices {long start;
 long end;} Tcl_RegExpIndices;
- The  start  and  end  values are Unicode character indices
relative to the offset location within objPtr where match
 ing began. The start index identifies the first character
 of the matched subexpression. The end index identifies
 the first character after the matched subexpression. If
 the subexpression matched the empty string, then start and
 end will be equal. If the subexpression did not
 participate in the match, then start and end will be set to -1.
- The extendStart field in Tcl_RegExpInfo is only set if the
TCL_REG_CANMATCH flag was used.  It  indicates  the  first
character  in  the string where a match could occur.  If a
 match was found, this will be the same as the beginning of
 the current match. If no match was found, then it indi
 cates the earliest point at which a match might occur if
 additional text is appended to the string. If it is no
 match is possible even with further text, this field will
 be set to -1.
SEE ALSO
re_syntax(n)
KEYWORDS
- match, pattern, regular expression, string, subexpression,
 Tcl_RegExpIndices, Tcl_RegExpInfo