Hints for A1

This file may be updated, so check back periodically

Quick Topic Links

Clarification of Lock Behaviour
OS/161 crash with no output
Weird/Garbled Output from sy2
Testing Your Assignment
Controlling the Number of Processors
More GDB tips

Clarification of Lock Behaviour

You need to enforce some constraints on how locks are used. One that is a common source of questions states roughly:

lock_release: A lock can not be released by a thread that is not holding the lock. This means that a thread can't release the lock before it acquires the lock and it can not release a lock that is held by another thread.

A common question is how to handle these types of problems. Some possible options are:

Do not release the lock but do not report an error. This is the least desirable option since a programmer calling lock release in an incorrect way will not be notified of the error.
Crash the kernel using panic().
Use KASSERT() to test for and catch the error.

Between panic() and KASSERT(), KASSERT is preferred, since lock_acquire() and lock_release() will be called within the kernel, so there is a high degree of confidence that they will be called correctly. If they are not there is a bug in the kernel and we want to be sure to fix the bug.

OS/161 crash with no output

When the OS/161 kernel starts to boot, there is a period of time during which the kernel cannot produce any output because the output device has not yet been initialized. If OS/161 crashes during this early phase, it will exit without producing any output, like this:

@cpu20.student[104]% sys161 kernel
sys161: System/161 release 1.13, compiled May 19 2007 21:48:06
sys161: 213925 cycles (88758k, 0u, 125167i)
sys161: 0 irqs 0 exns 0r/0w disk 0r/0w console 0r/0w/1m emufs 0r/0w net
sys161: Elapsed real time: 0.017304 seconds (12.3627 mhz)
sys161: Elapsed virtual time: 0.008557000 seconds (25 mhz)
@cpu20.student[105]%

In the example above, the only output is from the simulator program (sys161), which reports some execution statistics when the simulated machine shuts down. All of the usual boot messages from the OS/161 kernel are missing.

It is also true that the kernel creates and uses locks very early in the boot process. Thus, if you start working on locks and you find that the kernel starts failing without producing any output, as shown above, a likely explanation is that your lock code is responsible for an early crash of the kernel. Note that you cannot debug this problem by adding kprintfs to your kernel, because the failure is happening before the kernel can display the output from your kprintf. (In addition, kprintf actually uses a lock, so adding more kprintfs may affect the problem you are trying to debug, or may cause new problems.)

You need to use gdb to help you determine where the problem lies. A good thing to start with is to set breakpoints at your lock functions, and step through them when they are called to ensure that they are behaving as you expect. Another useful idea is to set a breakpoint at the panic function, which the kernel will invoke if an assertion fails.

Weird/Garbled output from sy2

If the output you are seeing appears to be a mess that looks something like the sample bit of output shown below, it is most likely because you have not implemented locks correctly. kprintf (which is used to print output when tests are run) requires lock_acquire and lock_release to be implemented correctly in order to prevent too much text from different threads from being completely intermingled. When locks don't work correctly the sy2 test prints out messages indicating failure but because the locks aren't implemented correctly the output becomes very hard to read because characters from the output from different threads are interleaved.

Starting lock test...
thread 0: Mismatch ont thhrteeads rtval2/3:te thMrsismetval1
aTtceehs t faaildoed
n t hreaadttheths2r: rd  eeMitsattahtrdev1 mtathr1t3dtaahhr7le 2hrdc: hea/dar2e :rMte0hd 1o 1ita:t hr Methreed ans4:htvea1add te5t aMlastdht:t  Mhi 2ir11d :hr et7shm1rs:is  maMtMsemr2et1hm1: dai Miis6m
tTaetar9:ththr etsasMiahhreathtch: e:chrv area:ddtsadcera sattaMi otdmm M  i22r madhd eh toaMi1hresa2s 5ce  tt9: M1Mhrrsn namaett:m:haoccd 2aiislem3et6 datahc a  dnhh f3ds8m12a0ae: tcdtrhMtMon   a: m8t a6tcas ct 2ee iciooi a /nud:h:ttMih:dc4sa1oshsnn lMestm  o e12s8 Mh:tvm nmi e3m tv:
nM s0 :h  aa 5ad st1tiattea Titvld
  oMl1:/ tch27McchMesa1:t: oMno tin seome:ihs isml/ evMnis mchu mtnasesstvotannMsai naMmtos ttv atma fsaat  ote l1toenmtctvalM2tslictiuns/si
ctTs athcat2i/vm2lhvsmmanmat tcn cheltc/steased 
uth eotsnt oh s1ehnmatt/o
aTlcveaets vn ott/s ualvctnem1shhlsvma ttevestont1a e s
/Tn otall2enaantc/lhvmtstteut 2n3o /st sl1vhn1
ale2sessmt/n/u ttctt/ua u T
/Tntt
 vnnomfavthevnmlomoeeu Tvfa n
uetveil sau
Tn n tsftaal m
saTTaeeotlm1e
sesmaieliltlseltvsd

Testing Your Assignment

Locks and Condition Variables

We will use some combination of sy2, uw1, and sy3 to test locks and condition variables. We will run these tests with different numbers of processors. (You can change the number of processors in the simulated machine by changing the sys161.conf file before starting sys161.)

Traffic Simulation

To test your traffic synchronization, we will run the sp3, command, which takes the form:

sp3 n k i t

n: threads (vehicles)
k: interations
i: inter-arrival time
t: seconds of service time

We will run tests with different numbers of processors to check whether your code works properly in each case. You can control the number of processors by editing the SYS/161 configuration file.

Each run of sp3 reports the total simulation time as well as the average waiting time for vehicles arriving from each direction. We will use these numbers to evaluate the fairness and efficiency of your solution:

Efficiency

To test efficiency, we will focus on tests in which the intersection is heavily loaded, i.e., tests in which the intersection is rarely empty. In particular, we will use test runs of the form

sp3 n k 0 t

In these tests, there are almost always n vehicles in the intersection, or waiting to get into the intersection. (Almost always, rather than always, because interarrival times are randomized by the simulation, and may not always be exactly zero.)

In an sp3 n k 0 t test, we will expect to see a total simulation time of 0.9nkt seconds or less when n is between 2 and 5. When n is 6 or greater, we expect to see a total simulation time of 5kt seconds or less. For example, for the test

sp3 4 15 0 1

we expect to see a total simulation time of 54 seconds or less.

Fairness

We will also use the same heavy-load tests to evaluate fairness. To measure fairness, we will compare per-direction average waiting times to the overall average waiting time (that is, the average of the per-direction average waiting times). We will consider waiting times to be fair if all of the per-direction average waiting times are within t (one service time) of the overall average waiting time.

We will also test your synchronization solution under light load, where the intersection is mostly idle. In these tests, we expect to see average waiting times at or near zero, depending on the specific test.

Controlling the Number of Processors

You can control the number of processors in the simulated machine by changing the SYS/161 configuration file, sys161.conf. Look for a line like this:

31	mainboard  ramsize=524288  cpus=1

The cpus=1 indicates that that simulated machine has a single CPU (processor). To simulate more CPUs, just use change the CPU count. For example, changing the line above to

31	mainboard  ramsize=524288  cpus=4

will give you a system with 4 CPUs.

So that you're not constantly editing the configuration file, you can create multiple different configuration files and pass the one you want to the simulator when it starts, using the -c option, e.g,

sys161 -c my-config-file.conf kernel

More GDB tips

This is an example gdb session to show what output to expect and how to up up the stack trace to look at a simple variable. This session was produced by Tim Brecht using a previous (older) version of OS/161 (so your code may not look the same). His comments are on the lines that start with "//". Each comment refers to the output that follows it.

This example assumes that there is a .gdbinit file like the one described in the A0 hints.

// Start cs350-gdb.
% cs350-gdb kernel
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=mips-elf"...
__start () at ../../arch/mips/mips/start.S:24
24         addiu sp, sp, -20
Current language:  auto; currently asm
Breakpoint 1 at 0x80010948: file ../../lib/kprintf.c, line 111.

// Continue execution. This stops and waits for input into os161.
// In the window running sys161 -w kernel after I type something like
// p testbin/program it runs and eventually panics.
(gdb) continue
Continuing.

Breakpoint 1, panic (fmt=0x8002fe6c "Assertion failed: %s, at %s:%d (%s)\n") at ../../lib/kprintf.c:111
111             if (evil==0) {
Current language:  auto; currently c

// I want to find out who called panic and why.
// Where shows the stack trace of who called what and all
// the parameters.
// panic is where we are currently.
// It was called from lock_destroy at line 181 of thread/synch.c
// It was called from proc_entry_init at line 601 of userprog/syscalls.c
// etc.
// proc_entry_init was the function that called lock_destroy.
(gdb) where
#0  panic (fmt=0x8002fe6c "Assertion failed: %s, at %s:%d (%s)\n") at ../../lib/kprintf.c:111
#1  0xffffffff8001740c in lock_destroy (lock=0x800c3e20) at ../../thread/synch.c:181
#2  0xffffffff80022b30 in proc_entry_init (pid=2) at ../../userprog/syscalls.c:601
#3  0xffffffff800258dc in sys_waitpid (pid=2, status=0x7fffff90, options=0, retval=0x800fcf18)
    at ../../userprog/syscalls.c:1367
#4  0xffffffff8000c6b8 in mips_syscall (tf=0x800fcf6c) at ../../arch/mips/mips/syscall.c:145
#5  0xffffffff8000cb68 in mips_trap (tf=0x800fcf6c) at ../../arch/mips/mips/trap.c:134

// Move up the stack.
(gdb) up
#1  0xffffffff8001740c in lock_destroy (lock=0x800c3e20) at ../../thread/synch.c:181
181             assert(lock->held == 0);

// Move up the stack.
(gdb) up
#2  0xffffffff80022b30 in proc_entry_init (pid=2) at ../../userprog/syscalls.c:601
601                     lock_destroy(p->ft_lock);

// Move up the stack.
(gdb) up
#3  0xffffffff800258dc in sys_waitpid (pid=2, status=0x7fffff90, options=0, retval=0x800fcf18)
    at ../../userprog/syscalls.c:1367
1367              proc_entry_init(pid);

// Print out some info about the current thread.
(gdb) print *curthread
$1 = {t_pcb = {pcb_switchstack = 2148519408, pcb_kstack = 2148519936, pcb_ininterrupt = 0, pcb_badfaultfunc = 0,
    pcb_copyjmp = {2148517880, 2147537544, 0, 0, 0, 0, 0, 0, 0, 0, 2148517880}},
  t_name = 0x800d2d00 "ourtests/conc-io", t_sleepaddr = 0x0, t_stack = 0x800fc000 "o?=\021o?=3", t_vmspace = 0x800f7360,
  t_cwd = 0x800d2f40, pid = 1}

// Print out the value of pid as it is inside of userprog/syscalls.c in the
// function sys_waitpid at the time of the call to proc_entry_init.
(gdb) print pid
$2 = 2

// Print out the value of pid in hex.
(gdb) print/x pid
$2 = 0x2

// x/i Examine (x) specified memory location as an instruction (i).
// Print out the instruction located at address 0xffffffff8000c6b8.
// Note that I got this address from the above stack trace.
// It shows you that this is address is 520 bytes inside of mips_syscall.
// This can be really helpful if OS/161 dies and sys161 just prints
// out the epc (which contains the address of the instruction
// that generated the execption.
(gdb) x/i 0xffffffff8000c6b8
0x8000c6b8 :  j       0x8000c6f0 

// x/i Examine (x) specified memory location as an instruction (i).
// Print out the instruction located at address 0xffffffff8001740c.
// Note that I got this address from the above stack trace.
// It shows you that this address is 148 bytes inside of lock_destroy.
(gdb) x/i 0xffffffff8001740c
0x8001740c :  lw      v0,32(s8)


The list command is useful for finding the line of source that contains the faulting instruction (note the *):

Something else that can be done (from a different version of the code)
(gdb) list *0x800197c4
0x800197c4 is in kmain (../../main/main.c:164).
159     kmain(char *arguments)
160     {
161         char *bad_ptr = NULL;
162         boot();
163         
164         *bad_ptr = 0;
165
166         menu(arguments);
167
168         /* Should not get here *