next up previous
Next: Programs Using Regular Expressions Up: Regular Expressions (RE) Previous: Regular Expressions (RE)

Rules for forming (basic) RE:

1. A non-special character is a RE matching that character.
Example:
a
6
%

Special characters are:
"$.$" period
"$*$" asterisk
"$[$" left square bracket
"$\backslash$" backslash
" $\hat{ }$" caret
"$" dollar sign

Note: A special character used as "normal" must be preceded by a backslash.
Example:
Main$\backslash$$.$m3
first$\backslash$$.$java

2. If token A is RE and token B is RE, then token AB is also RE Token AB is called a concatenation of REs A and B. RE first matches A then B.
Example: RE cs matches the string "cs" because it first matches c and then s.

3. (A) matches anything that matches token A ( precedence relations or grouping)

Notes:

4. A$\vert$B matches anything that matches token A or token B.
Example :
a) a$\vert$b
b) $($ab$)$$\vert$$($cd$)$.

Similarly as before, since $\vert$ is not listed as a special character, $\backslash$ is used to preserve its special meaning. And so,
a) will be written as:a$\backslash$$\vert$b
b) will be written as em $\backslash$$($ab$\backslash$$)$$\backslash$$\vert$$\backslash$$($cd$\backslash$$)$

5. A* , where token A is a RE, matches zero or more repetition of A
Example:

a* will match:
"" - an empty string
a - one repetition of a
aaaaaaa - more then one repetition of a etc.

w(ab)* will match:
w
wab
wababab etc.

6. '$.$' (single period) matches any character except newline $\backslash$n.
Example:
Main.m3 will match:
MainAm3
Main7m3 etc.

Recall, that if Main.m3 should be matched exactly, the appropriate RE has to be written as Main$\backslash$$.$m3.

[ab] - character classes
$[$ab$]$ means $($a$\vert$b$)$
$[$abcdefg$]$ means$($a$\vert$b$\vert$c$\vert$d$\vert$e$\vert$f$\vert$g$)$
Note, that a shortcut like $[$a$-$g$]$ is admissible. In case, the character '$-$' is really one of the character in $[$$]$, '$-$' has to be putted as first or last character in the list. For example: $[$a$-$g$-$$]$ will match a$\vert$b$\vert$c$\vert$d$\vert$e$\vert$f$\vert$g$\vert$$-$.

Examples:
$[$A$-$Za$-$z$]$$[$A$-$Za$-$z0$-$9_$]$$*$ matches any modula3 identifier or keyword
$[$A$-$Z$]$$[$A$-$Z$]$$*$ matches any sequence of at least one capital letter

7. $[$$\hat{ }$ab$]$ inverted character classes
Examples: $[$$\hat{ }$ab$]$ matches anything except $[$ab$]$
$[$$\hat{ }$a$-$q$]$ will match t or u or z but not a or d.

8. $\hat{ }$A - where A is a RE, matches anything that matches A, but only if it occurs at the beginning of the line.

Example:
$\hat{ }$$[$A$-$Z$]$ matches any capital letter at the beginning of the line.

9. A$ matches anything that matches A but only if it occurs at the end of a line

Example: The following RE finds all lines that contain only one word END, perhaps with spaces before and after it:
"$\hat{ }$ $*$END $*$$"

Some additional rules not supported by all UNIX applications:

10. A$+$ matches one or more repetitions of A.
Note: A$+$ is the same as AA$*$ and many UNIX applications do not support '$+$' at all. Consult man pages of the particular application for what is or is not supported.

11. A? matches zero or one repetition of A
Note: ? is the same as $($A$\vert$$\epsilon$$)$

Examples:
a1p$[$1$-$4$]$$\backslash$$.$txt test cases for the assignment 1
$\hat{ }$$[$$\hat{ }$$.$$]$ any character but '.' at the beginning of the line.
H$[$aeiou$]$llo second letter is a vowel
bugs$*$ bug, bugs, bugssss

Multiple matches:
{n,m} must occur at least n times, but no more than m times.
{n,} at least n times
{ ,m} no more than m times
{n} exactly n times
$*$ 0 or more times $($same as {0, } $)$
$+$ 1 or more times $($same as {1, } $)$
? 0 or 1 time $($same as {0,1}$)$
Examples:
quu$*$x $=$ qu$+$x$=$qu{1, }x
qu$\vert$qux$=$qu?x$=$qu{0,1}x
etc.


next up previous
Next: Programs Using Regular Expressions Up: Regular Expressions (RE) Previous: Regular Expressions (RE)
Instructional Support Group 2008-08-05