CS 241 - WLP4 Programming Language Specification

The WLP4 programming language contains a strict subset of the features of C++. A WLP4 source file contains a WLP4 program, which is a sequence of procedure definitions, ending with the main procedure wain.

Lexical Syntax

A WLP4 program is a sequence of tokens optionally separated by white space consisting of spaces, newlines, or comments. Every valid token is one of the following:

Tokens that contain letters are case-sensitive; for example, int is an INT token, while Int is not.

White space consists of any sequence of the following:

WLP4 programs are constructed by tokenizing (also called scanning or lexing) an ASCII string. To ensure a unique sequence of tokens is produced, the sequence of tokens is constructed by repeatedly choosing the longest prefix of the input that is either a token or white space. If the prefix is a token, it is added to the end of the WLP4 program token sequence. Then the prefix is discarded, and this process repeats with the remainder of the ASCII string input. This continues until either the end of the input is reached, or no prefix of the remaining input is a token or white space. In the latter case, the ASCII string is lexically invalid and does not represent a WLP4 program.

Context-Free Syntax

A context-free grammar for a valid WLP4 program is:

Context-Sensitive Syntax

Errors in context-sensitive syntax are referred to as semantic errors below. A program that contains a semantic error is not a valid WLP4 program and cannot be compiled.

Names & Identifiers

A procedure is any string derived from procedure or main. If it is derived from procedure, the name of the procedure is the lexeme of the ID in the grammar rule whose left-hand side is procedure. The name of the procedure derived from main is wain. A procedure is said to be declared from the first occurrence of its name in the string that makes up that procedure (i.e., once the name has been encountered in the procedure's header). The following semantic errors exist related to procedure declarations:

Thus, a procedure (other than wain) may call itself recursively, and a procedure may call procedures declared before itself, but a procedure may not call procedures declared after itself. Consequently, there is no mutual recursion in WLP4.

The procedure wain may not call itself recursively. However, this is actually enforced by the context-free grammar (since wain is a keyword, not an identifier, and therefore cannot appear as the ID in the procedure call rules factor → ID LPAREN RPAREN and factor → ID LPAREN arglist RPAREN) and therefore is not considered a semantic error.

Any ID in a sequence derived from dcl within a procedure p is said to be declared in p. Any ID derived from factor or lvalue within p is said to be used in p. The name of the ID is the lexeme of the ID token. String comparisons between names are are case sensitive; for example, "FOO" and "foo" are distinct.

The following semantic errors exist related to declarations and uses of IDs in a procedure.

An ID may have the same name as a procedure. If an ID x is declared in a procedure p, all occurrences of x within p refer to the ID x, even if a procedure named x has been declared. The same is true in the special case that p = x: a declared ID may have the same name as the procedure that contains it; in this case, all occurrences of ID refer to the variable, not the procedure. This rule means there is an additional semantic error related to procedures and IDs:

For example, the following program does not contain any semantic errors. The declaration of the ID p as a parameter of the procedure p, and the use of the ID p in the return expression of procedure p, are not issues, because within the procedure p the ID p refers to the parameter variable. Within wain, the ID p refers to the procedure, since no ID named p is declared in wain, so the procedure call p(a) is valid.

int p(int p) { return p; }
int wain(int a, int b) { return p(a); }
On the other hand, the following program contains a semantic error. Th only difference is the name of the second parameter of wain has been changed from b to p. Therefore, there is now an ID p declared in wain, so the procedure call p(a) in the return expression of wain is not valid.
int p(int p) { return p; }
int wain(int a, int p) { return p(a); }

Types

An ID whose name occurs in a sequence derived from dcl has a type, which is either int or int*:

Other IDs (particularly, the IDs corresponding to procedure names) do not have types and are said to be untyped.

Every procedure has a signature, which is a list of strings, each of which is either int or int*. The signature of a procedure is the sequence of strings int or int* that is derived from params. Note that this sequence may be empty. The signature indicates the number of arguments expected by the procedure, and the type of each argument.

Instances of the tokens NUM, NULL and the nonterminals factor, term, expr, and lvalue also have a type, which is either int or int*. The types of these tokens and nonterminals are determined by the following rules. If the conditions of a rule are not satisfied, the program contains a semantic error.

Additionally, all of the following conditions must be satisfied, or the program contains a semantic error.

Behaviour

Any WLP4 program that obeys the lexical, context-free, and context-sensitive syntax rules above is a also a valid C++ program fragment. The behaviour of the WLP4 program is generally expected to be the same as the C++ program formed by inserting the WLP4 program at the indicated location in one of the following C++ program shells:

Note that putchar and getchar are provided by the C standard library, and their behaviour is as described by C.

There are some situations where the expected behaviour of a WLP4 program may differ from that of the C++ program shells above: