General Programming Style


The following programming style is REQUIRED for all programming questions that you write. Rather than being too strict, two or more ways of doing something are often shown. Requiring a particular style is common practice in business and it is necessary in this course to simplify the marking of assignments. It is very difficult for the markers to adjust to multiple coding styles when marking programs. If you are unable to reconcile yourself to these standards, you must discuss any deviation with your instructor to obtain permission for an alternative style.

General:

Remember that people have to read the code, which is an unpleasant chore at best so making things easy on the eye means more time can be spent seeing what a terrific job has been done. Vertical and horizontal whitespace is necessary (especially if you want the marker to annotate what/where things are wrong).

Meaningful Variable Names:

The name of a variable should describe the purpose of the value it contains. Do not use names like found, flag, done; found what? done what? Some names like i and j, for array indexes, and n, for the number of things in a group, are nearly meaningless, but are allowed because their use is so stereotypical. Names like row, col and numOfChars are better. In general, the greater the scope of a variable name, the more meaningful its name needs to be, and indirectly, the longer its name should be.

Indentation:

Proper, consistent indentation is required to emphasize program structure. It is also important to maintain a consistent format to aid in searching for the beginning and end of a control structure. Suggested indentation levels are fixed multiples of 2, 3 or 4. Using tabs for indentation is common. Most program editors automatically indent code and can set the tab width, e.g., vim, set shiftwidth=4 and learn about the >> and << commands.

Routine Documentation:

The best routine documentation is clear, understandable code. Comments should give an overview of the code or explain tricky portions of the code.

Routine documentation should describe what a routine does, not how it does it. Start with a one-sentence summary. For each parameter, describe what it is used for. Do the same for all global variables a routine uses: they are invisible parameters. State what assumptions a routine makes, and what it does if it detects an error.

Variables should be accompanied by a comment describing their purpose, if that is not obvious from the variable's name. A single-line comment is usually sufficient.

A group of statements that perform a non-trivial task should be preceded by a comment summarizing that task. Do not reiterate in English what is obvious from the code (e.g., "variable x is incremented").

Modularity:

When designing a program, break it down into its functional components. Try to isolate those things that may change independently into separate routines, each with a standard interface. The purpose of any routine should be describable in a single sentence (e.g., a standard deviation calculation or add the node pointed to by p to the list L). A routine should perform exactly one operation. This structure is necessary if you expect markers to understand the program (and if they cannot understand your program, they can only assume that you did not understand the material).

Have a clean interface to a module. Do not pass unnecessary information.

Command Line Options:

If your program requires options on the command line and they are improperly specified, or an error results (e.g., trying to open a non-existent file for reading), the program must print an error message and terminate. Do not try to read in the missing information from standard input as this causes problems for the markers.

Efficiency:

premature optimization is the root of all evil. [Knu74, p. 268]

Typically a program spends 90% of its execution executing 10% of its code. Further, it is not always obvious what code constitutes that 10%. If efficiency is of paramount importance, find the 10% and do what it takes to make it efficient. Writing obscure "efficient" code involving side-effects is a waste of time and makes maintenance difficult if it is not in the 10% region. Remember, the point of an assignment is not ultra-efficiency, it is conveying an understanding of the material through a clear, concise, and well-developed program.

C/C++ Programming Style

Comments:

A comment should be at the same indentation level as the code it describes. Note the comment styles for multi-line comments:

/*			/*
 * comment ...	or	  comment ...
 */			*/

These forms allow new lines to be added without changing existing lines in a program.

Single-line comments may also be written as follows:

statement;		// simple descriptive comment

Try to align all the comments at a particular column, for example:

statement;		// simple descriptive comment
statement;		// simple descriptive comment
statement;		// simple descriptive comment

which is easily done using tabs to move to a particular fixed column. You are strongly encouraged to use end-of-line comments in your code.

Case of Variables, Routines, Macros and Typedefs:

Unlike some systems, UNIX and C/C++ support upper and lower case letters. This facility should be used to advantage to make names more meaningful and easier to read. The de facto standard for C/C++ is:

Convention Used For Example
completely in upper-case macro names (#define), constants
const int GROUP_SIZE = 10;
#define RECURSIVE 1
#if RECURSIVE == 1
first letter capitalized structures, classes, and typedef
typedef list<int> Integer_List;
struct StudentRecord { ... };
first letter not capitalized variables, routines, method names
int intsortedArray[]
void sort(...) { ... }

Declarations:

C/C++ have a top-level scope in which routines of a source file are collected. This external scope level also allows declaration of macros, types and variables. External scope declarations are visible within all routines that textually appear after them in a file, and can be globally referenced in these routines. In general, the scope of a variable should be no greater than its point of usage. In this course, global macros and types are allowed, while global variables are strongly discouraged.

However, there is one situation where global variables can be used, that is, in simulating a module or package. Variables declared in the external scope are implicitly local to that text file. These variables correspond to the private variables of a module or package. Routines that are exported (non-static) can use these variables without requiring users to explicitly create them and pass them as parameters. Declarations in the external scope must always state their purpose, the rules for using them, and which routines use them. In C++, classes can generally be used to solve such problems without the use of global variables.

If a variable is only used within the scope of a control structure, make it local to the control structure.

Non-Local Local
int rows, cols;  
int **matrix; 
int i, j;

for ( ;; ) {

    // read matrix dimensions
    cin >> rows >> cols;

  if ( cin.fail() ) { break; } // no matrix ?

    // allocate, read, and print matrix
    matrix = new int *[rows]; 
    for ( i = 0; i < rows; i += 1 ) {
        matrix[i] = new int[cols];
        for ( j = 0; j < cols; j += 1 ) {
            cin  >> matrix[i][j];
            cout << matrix[i][j];
        } // for
        cout << endl;
    } // for

    // do something to the matrix

    // deallocate matrix
    for ( i = 0; i < rows; i += 1 ) {
        delete [] matrix[i];
    } // for
    delete [] matrix;
} // for




for ( ;; ) {
    int rows, cols;  
    // read matrix dimensions
    cin >> rows >> cols;

  if ( cin.fail() ) { break; } // no matrix ?

    // allocate, read, and print matrix
    int matrix[rows][cols]; 
    for ( int i = 0; i < rows; i += 1 ) {
 
       for ( int j = 0; j < cols; j += 1 ) {
            cin  >> matrix[i][j];
            cout << matrix[i][j];
        } // for
        cout << endl;
    } // for

    // do something to the matrix






} // for

Dynamic Allocation:

Dynamic allocation must only be used when a variable's storage must outlive the block in which it is allocated:

Type *rtn(...) {
    Type *tp = new Type; // MUST USE HEAP
    ...                  // initialize/compute using tp
    return tp;           // storage outlives block
} // tp deleted later

or if initialization forces dynamic allocation for an array of objects when each element has different values:

struct Complex {
    double re, im;
    Complex( double re = 0.0, double im = 0.0 ) : re(re), im(im) {}
};
int main() {
    Complex c1[3];  			// stack, elements set to {0.0, 0.0}
    Complex c2[3] = { {1.0,0.0}, {2.0,0.0}, {3.0,0.0} }; // stack, c++0x only
    Complex *c3[3];
    for ( int i = 0; i < 3; i += 1 ) {	// heap
	c3[i] = new Complex( i );
    }
    ...
    for ( int i = 0; i < 3; i += 1 ) {
	delete c3[i];
    }
}

Constants:

Use symbolic rather than literal constants in a program. (There are a trivial number of exceptions, e.g., 0, 1). While C/C++ provide preprocessor variables to define constants, e.g.:

#define PI 3.14159
#define PI_4 (3.14159 / 4.0)

a global declaration with a const specifier should be used instead, for example:

const float PI = 3.14159;
const float PI_4 = 3.14159 / 4.0;

Note a macro definition that is more complex than a constant should always be surrounded by parentheses to avoid side effects when expanded.

Macros:

Both C/C++ have an alternative means of defining macros by declaring functions inline.

        #define MAX( a, b ) ((a > b) ? a : b)
        inline int MAX( int a, int b ) { return a > b ? a : b; }

A parameter should not appear multiple times in the macro definition as this can result in multiple side-effects as the argument is evaluated multiple times. Macros are not routines!

Statements:

Place each statement (declarations, expressions, control-structures) on its own line to enhance the readability of your code.

Braces:

Braces must always be present. They are necessary so that new statements can be added without needing to modify existing statements. The exact position of braces is largely a religious issue. However, one point should be noted about putting each brace on a separate line. This style introduces extra whitespace, which does not necessarily make the program easier to read because information may be forced out of view. The extra vertical whitespace means that less information is shown, making the possibility of viewing connected pieces of code less likely, for example:

if ( a == b )
{
    a = 3;
    b = 5;
}
else
{
if ( a == b ) {
    a = 3;
    b = 5;
} else {
    a = 2;
    b = 7;
} // if

This extra visibility is critical during all aspects of program development. Therefore, it is recommended (but not required) that you forego extra vertical whitespace in favour of increased information display.

All closing braces must start on a separate line and, in general, it is a good habit to label them as to the type of control structure or routine they terminate. Labelling exceptions are "} else {" and "} while ( expr )", which are self documenting. Documenting closing braces allows identification of a terminating block when the start of the block is not visible. This can occur because of page boundaries in a listing or screen boundaries on a terminal. For example, if you need to insert a line after the end of the second if statement in the following nested control structures and this is all that is visible on your screen:

             }
        }
    }
}
             } // if
        } // while
    } // if
} // for

it is much easier (and takes less time) to find the desired location if the closing braces are labeled appropriately.

White Space in a Line:

Put extra blanks in expressions to make them easier to read, for example:

a = b * (c + d % 3 );	versus	a=b*(c+d%3);
Side-Effects:

A side-effect occurs when the value represented by a name in a particular scope is changed; hence, an effect at some later time (on the side) changes the binding of a name to a value. Assignment is the most obvious way to cause a side-effect, as in:

{
    int x = 3;	// first binding is not a side-effect
    ...		// during this part of the block, x is bound to 3
    x = 4;	// an operation that causes a side-effect
    ...		// during this part of the block, x is bound to 4
}

Functional programming [Rea89], represented by languages like FP and pure LISP, has no side effects, which means there is only initialization at declaration; imperative programming, represented by languages like Ada and C/C++, has side effects after a declaration. While imperative programming uses side effects, reducing the number of side effects and/or the locations where they occur can decrease the complexity and increase the maintainability of a program. (In a similar compromise, functional programming occasionally uses assignment to increase performance.) Therefore, it is good programming practice to reduce the number of side effects and define precisely the locations where side effects occur.

In this course, these locations are limited to the last operation of an expression through the assignment operator. In general terms, there can be a maximum of one side effect on the left margin of a line of C/C++ code. This rule results in the following restrictions in C/C++:

Unfortunately, C/C++ programmers have a tradition of using multiple side effects, particularly in expressions, usually to gain efficient execution because of poor compilers or deficiencies in C. Fortunately, compiler technology has improved to the point where it is no longer necessary to write such code. The following is a typical C/C++ expression with multiple side effects in an expression and its corresponding multi-line version:

int getc1(FILE *p) {
    return (--(p)->_cnt>=0 ? ((int)*(p)->_ptr++) : _filbuf(p));
} // getc1

int getc2(FILE *p) {
    int r;

    p->_cnt -= 1;
    if (p->_cnt >= 0) {
        r = (int)*(p)->_ptr;
        p->_ptr += 1;
    } else 
        r = _filbuf(p);
    return r;
} // getc2

While both routines may be considered to be complex, the latter is clearly easy for most C/C++ programmers to understand and modify. Further, the compiler used in this course generates almost identical code in both cases (1 instruction difference). Certain deficiencies of C/C++ may require multiple side effects in expressions, but are extremely rare.

The following discussion is further motivation for these restrictions.

=, ++ and -- Operators:

These operators cause a side effect by modifying or incrementing or decrementing the value of a variable, while also returning the value of the variable. Restricting the location of side effects means that they cannot be used in expressions. At best they can only appear in single statement form, as in:

i++
...
for ( i = 0; i < 10; i++ ) ...

The preferred style is to use operators like += and -=, since they provide the same capability and are more general, as in:

i += 1;
...
for ( i = 0; i < 10; i += 1 ) ...
operator= Operators:

The family of operators, operator=, should be used whenever possible as they are general and efficient. The generality comes from the ability to use any binary operator and have an arbitrarily complex right-hand side, as in:

a[i * j / 2]  *=  k[l * m + 3];   vs    a[i * j / 2]  =  a[i * j / 2] * k[l * m + 3];

Not duplicating the left-hand side of the expression on the right-hand side reduces error. Further, operator= is easier to extend, while specialty operators are not, as in changing from incrementing by 1 to incrementing by 3:

i += 1	changed to	i += 3;
i++	changed to	i++ ++ ++;  // this is illegal for other reasons

Contrary to what some C books suggest, there is no performance advantage in either the ++ or -- operator with modern C/C++ compilers and machines. For example, most of todays C/C++ compilers generate exactly the same code for:

i++;
i += 1;
i = i + 1;

Finally, both beginners and experienced C/C++ programmers often make mistakes with operators ++ and --. An experienced programmer on the net wrote:

A function had an output parameter which was a numeric counter, i.e., a pointer to an integer. I wrote the code to increment the counter as *count++, which does the wrong thing (it should be (*count)++), or, as I rewrote it, *count += 1.

Returning Multiple Values:

C/C++ does not allow a routine to return multiple values through the assignment operator, as in

[ x, y ] = f( a, b );   // not allowed in C/C++

Instead, this can be done by returning values through the argument-parameter mechanism. Therefore, the call must be rewritten as:

x = f( &y, a, b );    or    f( &x, &y, a, b );

where routine f has one or two output parameters, respectively.

Alternatively, a routine can return a structure, rather than a simple variable, thus allowing each component of the structure to be a return value. However, it is probably not a good idea to define a structure only to serve as the return value of a particular function.

Overusing this->:

Using this-> to differentiate class from local variables makes programs difficult to read and is better done using good variable names (the greater the scope, the longer a name).

struct Foo {
    int i, j, k;
    void mem( int i, int w, int z ) {
        this->i = this->j / w + z * this->k + i;  // difficult to read
        this->i = j / w + z * k + i;  // easier to read, only disambiguate i
        Foo::i = j / w + z * k + i;
    }
};

The object-identifier this is rarely needed, except to disambiguate class from local variables with the same name or to pass the object as an argument. Structure qualification is safer than this-> for inheritance:

struct B {
    int x;
};
struct D : public B {	// inheritance
    int x;		// added declaration changes meaning of this->x in mem
    void mem() {
        this->x;	// meaning changes: B::x changes to D::x
        D::x;		// meaning fixed: D::x
        B::x;		// meaning fixed: B::x
    }
};
Duplication of Code:

Eliminating side effects from expressions can force unnecessary duplication of code, as in:

while ( cin.get(c) != '\n' ) . . .

which could become:

cin.get(c);		// duplicate code
while ( c != '\n' ) {
    ...
    cin.get(c);		// duplicate code
} // while

Fortunately, the duplicated code in this case can be eliminated by using a multi-exit loop, as in:

                                                                           
for ( ;; ) {           
    cin.get(c);       
  if ( c == '\n' ) { break; }
    ...             
} // for            

You are strongly encouraged to use multi-exit loops to eliminate duplicate code [Buh85]. Further, the compiler for this course generates code of identical efficiency for the multi-exit loop and the first while loop with the side effect in its expression. However, all exit points should be highlighted by outdenting or with comments to make them easy to locate. (This document uses outdenting.)

Return Statement:

Multiple exits from a routine, like multiple exits from a loop, are useful, but care must be taken not to overuse either style. In many cases, a few nested if statements can eliminate multiple returns; alternatively, a test with a return at the beginning of a routine can eliminate unnecessary nesting in the body of the routine. Reducing the number of return statements is usually more important for a routine that returns a value, i.e., a function rather than a procedure, as it reduces the number of locations to change when the variable or expression that calculates the returned value is modified. More important than the number of returns or exits is well-designed logic.

Poor Logic Better Logic
int rtn( int x, int y ) {
    if ( x > 3 ) {
        x += 1;
        if ( x > y ) {
            x = y;
        } else {
            return x + 3;
        }
        return x + y;
    } else if ( x < 3 ) {
        return 3;
    } else {
        return y;
    }
}
int rtn( int x, int y ) {
  if ( x < 3 ) { return 3; }
  if ( x == 3 ) { return y; }
    x += 1;
    if ( x > y ) {
        x = y + y;
    } else {
        x += 3;
    }
    return x;
}




As with multi-exit loops, return points should be highlighted by outdenting or comments to make them easy to locate.

Example:

Figure 1 shows one possible way of incorporating the required programming style into a C++ program. You are not required to follow this example, but you must supply the same information and follow the other rules stated above.

Additional examples for I/O, strings, linked-lists, and other concepts in C/C++ are available.

/*********** SearchInsert ***********
     Purpose: An element is looked up in an unsorted list of items,
          if it does not appear in the list, it is added to the end of the list,
          if it exists in the list, its associated list counter is incremented.


     Returns: Position in list of key.


     Errors: List is full and cannot insert item. Program is terminated.


     Globals:
          MaxListSize - used for error checking
          Elem - type of a list element
************************************/

int SearchInsert(
     Elem list[ ],	// array of items to be searched and possibly modified
     int  &listSize,	// reference to number of items in list, MAY BE MODIFIED!
     int key		// item that is searched for in the list
) {
     int pos;

     // loop has two exits: search does not find key and key is found

     for ( pos = 0; ; pos += 1 ) {

       if ( pos >= listSize ) {			// if key not found, insert
               listSize += 1;
               if ( listSize > MaxListSize ) {          // list full ?
                     cerr << "ERROR: List is full." << endl;
                     exit(EXIT_FAILURE);	// TERMINATE PROGRAM
               } // if
               list[pos].count = 1;
               list[pos].data = key;
               break;
          } // exit

       if ( key == list[pos].data ) {		// if key found, increment counter
               list[pos].count += 1;
               break;
          } // exit
     } // for

     return pos;				// return position of key in list
} // SearchInsert

References

[Buh85]
P. A. Buhr. A Case for Teaching Multi-exit Loops to Beginning Programmers. SIGPLAN Notices, 20(11):14-22, November 1985.
[Knu74]
Donald E. Knuth. Structured Programming with go to Statements. ACM Computing Surveys, 6(4):261-301, December 1974.
[Rea89]
Chris Reade. Elements of Functional Programming. Addison Wesley, 1989.