1. A non-special character is a RE matching that character.
Example:
a
6
%
Special characters are:
"" period
"" asterisk
"" left square bracket
"" backslash
" " caret
"$" dollar sign
Note:
A special character used as "normal" must be preceded by a backslash.
Example:
Mainm3
firstjava
2. If token A is RE and token B is RE, then token AB is also RE
Token AB is called a concatenation of REs A and B.
RE first matches A then B.
Example: RE cs matches the string "cs" because it first
matches c and then s.
3. (A) matches anything that matches
token A ( precedence relations or grouping)
Notes:
4. AB matches anything that matches token A or token B.
Example :
a) ab
b) ab
cd
.
Similarly as before, since is not listed as a special character,
is used to preserve its special meaning.
And so,
a) will be written as:ab
b) will be written as
em ab
cd
5. A* , where token A is a RE, matches zero or more repetition of A
Example:
a* will match:
"" - an empty string
a - one repetition of a
aaaaaaa - more then one repetition of a etc.
w(ab)* will match:
w
wab
wababab etc.
6. '' (single period) matches any character except newline
n.
Example:
Main.m3 will match:
MainAm3
Main7m3 etc.
Recall, that if Main.m3 should be matched exactly,
the appropriate RE has to be written as Mainm3.
[ab] - character classes
ab
means
a
b
abcdefg
means
a
b
c
d
e
f
g
Note, that a shortcut like a
g
is admissible.
In case, the character '
' is really one of the character in
,
'
' has to be putted as first or last character in the list.
For example:
a
g
will match a
b
c
d
e
f
g
.
Examples:
A
Za
z
A
Za
z0
9_
matches any modula3
identifier or keyword
A
Z
A
Z
matches any sequence
of at least one capital letter
7. ab
inverted character classes
Examples:
ab
matches anything except
ab
a
q
will match t or
u or z but not a or d.
8. A - where A is a RE, matches anything that matches A,
but only if it occurs at the beginning of the line.
Example:
A
Z
matches any capital letter
at the beginning of the line.
9. A$ matches anything that matches A but only
if it occurs at the end of a line
Example: The following RE finds all lines that contain only one word END,
perhaps with spaces before and after it:
"
END
$"
Some additional rules not supported by all UNIX applications:
10. A matches one or more repetitions of A.
Note: A is the same as AA
and many UNIX applications do not support '
' at all.
Consult man pages of the particular application for what is or is not supported.
11. A? matches zero or one repetition of A
Note: ? is the same as A
Examples:
a1p1
4
txt test cases for the assignment 1
any character but '.' at
the beginning of the line.
Haeiou
llo second letter is a vowel
bugs bug, bugs, bugssss
Multiple matches:
{n,m} must occur at least n times, but no more than m times.
{n,} at least n times
{ ,m} no more than m times
{n} exactly n times
0 or more times
same as {0, }
1 or more times
same as {1, }
? 0 or 1 time same as {0,1}
Examples:
quux
qu
x
qu{1, }x
ququx
qu?x
qu{0,1}x
etc.