A regular expression (regex) is a powerful way of specifying a complex search.
MySQL uses regular Henry Spencer's inplementation of regular
expressions. And that is aimed to conform to POSIX
1003.2. MySQL uses the extended version.
This is a simplistic reference that skips the details. To get more exact
information, see Henry Spencer's regex(7)
manual page that is
included in the source distribution. C Contributors to MySQL.
A regular expression describes a set of strings. The simplest regexp is
one that has no special characters in it. For example, the regexp
hello
matches hello
and nothing else.
Nontrivial regular expressions use certain special constructs so that
they can match more than one string. For example, the regexp
hello|word
matches either the string hello
or the string
word
.
As a more complex example, the regexp B[an]*s
matches any of the
strings Bananas
, Baaaaas
, Bs
and any other string
starting with a B
, ending with an s
, and containing any
number of a
or n
characters in between.
A regular expression may use any of the following special
characters/constructs:
^
-
Match the beginning of a string.
mysql> select "fo\nfo" REGEXP "^fo$"; -> 0
mysql> select "fofo" REGEXP "^fo"; -> 1
$
-
Match the end of a string.
mysql> select "fo\no" REGEXP "^fo\no$"; -> 1
mysql> select "fo\no" REGEXP "^fo$"; -> 0
.
-
Match any character (including newline).
mysql> select "fofo" REGEXP "^f.*"; -> 1
mysql> select "fo\nfo" REGEXP "^f.*"; -> 1
a*
-
Match any sequence of zero or more
a
characters.
mysql> select "Ban" REGEXP "^Ba*n"; -> 1
mysql> select "Baaan" REGEXP "^Ba*n"; -> 1
mysql> select "Bn" REGEXP "^Ba*n"; -> 1
a+
-
Match any sequence of one or more
a
characters.
mysql> select "Ban" REGEXP "^Ba+n"; -> 1
mysql> select "Bn" REGEXP "^Ba+n"; -> 0
a?
-
Match either zero or one
a
character.
mysql> select "Bn" REGEXP "^Ba?n"; -> 1
mysql> select "Ban" REGEXP "^Ba?n"; -> 1
mysql> select "Baan" REGEXP "^Ba?n"; -> 0
de|abc
-
Match either of the sequences
de
or abc
.
mysql> select "pi" REGEXP "pi|apa"; -> 1
mysql> select "axe" REGEXP "pi|apa"; -> 0
mysql> select "apa" REGEXP "pi|apa"; -> 1
mysql> select "apa" REGEXP "^(pi|apa)$"; -> 1
mysql> select "pi" REGEXP "^(pi|apa)$"; -> 1
mysql> select "pix" REGEXP "^(pi|apa)$"; -> 0
(abc)*
-
Match zero or more instances of the sequence
abc
.
mysql> select "pi" REGEXP "^(pi)*$"; -> 1
mysql> select "pip" REGEXP "^(pi)*$"; -> 0
mysql> select "pipi" REGEXP "^(pi)*$"; -> 1
{1}
-
{2,3}
-
The is a more general way of writing regexps that match many
occurrences of the previous atom.
a*
-
Can be written as
a{0,}
.
a+
-
Can be written as
a{1,}
.
a?
-
Can be written as
a{0,1}
.
To be more precise, an atom followed by a bound containing one integer
i
and no comma matches a sequence of exactly i
matches of
the atom. An atom followed by a bound containing one integer i
and a comma matches a sequence of i
or more matches of the atom.
An atom followed by a bound containing two integers i
and
j
matches a sequence of i
through j
(inclusive)
matches of the atom.
Both arguments must 0 >= value <= RE_DUP_MAX (default 255)
.
If there are two arguments, the second must be greater than or equal to the
first.
[a-dX]
-
[^a-dX]
-
Matches
any character which is (or is not, if ^ is used) either
a
, b
,
c
, d
or X
. To include a literal ]
character,
it must immediately follow the opening bracket [
. To include a
literal -
character, it must be written first or last. So
[0-9]
matches any decimal digit. Any character that does not have
a defined meaning inside a []
pair has no special meaning and
matches only itself.
mysql> select "aXbc" REGEXP "[a-dXYZ]"; -> 1
mysql> select "aXbc" REGEXP "^[a-dXYZ]$"; -> 0
mysql> select "aXbc" REGEXP "^[a-dXYZ]+$"; -> 1
mysql> select "aXbc" REGEXP "^[^a-dXYZ]+$"; -> 0
mysql> select "gheis" REGEXP "^[^a-dXYZ]+$"; -> 1
mysql> select "gheisa" REGEXP "^[^a-dXYZ]+$"; -> 0
[[.characters.]]
-
The sequence of characters of that collating element. The sequence is a
single element of the bracket expression's list. A bracket expression
containing a multi-character collating element can thus match more than
one character, e.g., if the collating sequence includes a
ch
collating element, then the regular expression [[.ch.]]*c
matches the
first five characters of chchcc
.
[=character_class=]
-
An equivalence class, standing for the sequences of characters of all
collating elements equivalent to that one, including itself.
For example, if
o
and (+)
are the members of an
equivalence class, then [[=o=]]
, [[=(+)=]]
, and
[o(+)]
are all synonymous. An equivalence class may not be an
endpoint of a range.
[:character_class:]
-
Within a bracket expression, the name of a character class enclosed in
[:
and :]
stands for the list of all characters belonging
to that class. Standard character class names are:
alnum | digit | punct
|
alpha | graph | space
|
blank | lower | upper
|
cntrl | print | xdigit
|
These stand for the character classes defined in the ctype(3)
manual
page. A locale may provide others. A character class may not be used as an
endpoint of a range.
mysql> select "justalnums" REGEXP "[[:alnum:]]+"; -> 1
mysql> select "!!" REGEXP "[[:alnum:]]+"; -> 0
[[:<:]]
-
[[:>:]]
-
These match the null string at the beginning and end of a word
respectively. A word is defined as a sequence of word characters which
is neither preceded nor followed by word characters. A word character is
an alnum character (as defined by
ctype(3)
) or an underscore
(_
).
mysql> select "a word a" REGEXP "[[:<:]]word[[:>:]]"; -> 1
mysql> select "a xword a" REGEXP "[[:<:]]word[[:>:]]"; -> 0
mysql> select "weeknights" REGEXP "^(wee|week)(knights|nights)$"; -> 1