1. A non-special character is a RE matching that character.
Example:
a
6
%
Special characters are:
"
" period
"
" asterisk
"
" left square bracket
"
" backslash
"
" caret
"$" dollar sign
Note:
A special character used as "normal" must be preceded by a backslash.
Example:
Main![]()
m3
first![]()
java
2. If token A is RE and token B is RE, then token AB is also RE
Token AB is called a concatenation of REs A and B.
RE first matches A then B.
Example: RE cs matches the string "cs" because it first
matches c and then s.
3. (A) matches anything that matches
token A ( precedence relations or grouping)
Notes:
4. A
B matches anything that matches token A or token B.
Example :
a) a
b
b)
ab![]()
![]()
cd
.
Similarly as before, since
is not listed as a special character,
is used to preserve its special meaning.
And so,
a) will be written as:a![]()
b
b) will be written as
em ![]()
ab![]()
![]()
![]()
![]()
![]()
cd![]()
5. A* , where token A is a RE, matches zero or more repetition of A
Example:
a* will match:
"" - an empty string
a - one repetition of a
aaaaaaa - more then one repetition of a etc.
w(ab)* will match:
w
wab
wababab etc.
6. '
' (single period) matches any character except newline
n.
Example:
Main.m3 will match:
MainAm3
Main7m3 etc.
Recall, that if Main.m3 should be matched exactly,
the appropriate RE has to be written as Main![]()
m3.
[ab] - character classes
ab
means
a
b
abcdefg
means
a
b
c
d
e
f
g
Note, that a shortcut like
a
g
is admissible.
In case, the character '
' is really one of the character in ![]()
,
'
' has to be putted as first or last character in the list.
For example:
a
g![]()
will match a
b
c
d
e
f
g![]()
.
Examples:
A
Za
z![]()
A
Za
z0
9_![]()
matches any modula3
identifier or keyword
A
Z![]()
A
Z![]()
matches any sequence
of at least one capital letter
7. ![]()
ab
inverted character classes
Examples:
![]()
ab
matches anything except
ab
![]()
a
q
will match t or
u or z but not a or d.
8.
A - where A is a RE, matches anything that matches A,
but only if it occurs at the beginning of the line.
Example:
![]()
A
Z
matches any capital letter
at the beginning of the line.
9. A$ matches anything that matches A but only
if it occurs at the end of a line
Example: The following RE finds all lines that contain only one word END,
perhaps with spaces before and after it:
"
END
$"
Some additional rules not supported by all UNIX applications:
10. A
matches one or more repetitions of A.
Note: A
is the same as AA
and many UNIX applications do not support '
' at all.
Consult man pages of the particular application for what is or is not supported.
11. A? matches zero or one repetition of A
Note: ? is the same as
A![]()
![]()
Examples:
a1p
1
4![]()
![]()
txt test cases for the assignment 1
![]()
![]()
![]()
![]()
any character but '.' at
the beginning of the line.
H
aeiou
llo second letter is a vowel
bugs
bug, bugs, bugssss
Multiple matches:
{n,m} must occur at least n times, but no more than m times.
{n,} at least n times
{ ,m} no more than m times
{n} exactly n times
0 or more times
same as {0, }
1 or more times
same as {1, }
? 0 or 1 time
same as {0,1}
Examples:
quu
x
qu
x
qu{1, }x
qu
qux
qu?x
qu{0,1}x
etc.