Hints for A1

This file will be updated during the term, so check back periodically


Quick Topic Links


Clarification of Lock Behaviour

You need to enforce some constraints on how locks are used. One that is a common source of questions states roughly: A common question is how to handle these types of problems. Some possible options are:
  1. Do not release the lock but do not report an error. This is the least desirable option since a programmer calling lock release in an incorrect way will not be notified of the error.
  2. Crash the kernel using panic().
  3. Use assert() to test for and catch the error.
Between panic() and assert(), assert is preferred, since lock_acquire() and lock_release() will be called within the kernel, so there is a high degree of confidence that they will be called correctly. If they are not there is a bug in the kernel and we want to be sure to fix the bug.

panic: Out of memory

OS/161 is not very sophisticated in its management of physical memory. Each new thread created by OS/161 consumes some physical memory, but OS/161 does not reclaim and reuse this memory when threads exit. As a result, OS/161 will quickly run out of physical memory and crash. The usual symptom is an "Out of memory" panic, which might look something like this:
OS/161 kernel [? for menu]: sy3
Starting CV test...
Threads should print out in reverse order.
ccccccccccscheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
scheduler: Dropping thread synchtest.
panic: cvtest: thread_fork failed: Out of memory
sys161: 1002103471 cycles (28377197k, 0u, 973726274i)
sys161: 9191 irqs 0 exns 0r/0w disk 21r/5855w console 0r/0w/1m emufs 0r/0w net
sys161: Elapsed real time: 40.115085 seconds (24.9807 mhz)
sys161: Elapsed virtual time: 40.084138840 seconds (25 mhz)
The best way to avoid this problem is to quit OS/161 after every test. For each test, start the simulator and let OS/161 boot, then run your test, then quit OS/161 down. Better yet, to save time and keystrokes, boot OS/161, run your test, and shut OS/161 down all with a single shell command line, like this
% sys161 kernel "sy3; q"

OS/161 crash with no output

When the OS/161 kernel starts to boot, there is a period of time during which the kernel cannot produce any output because the output device has not yet been initialized. If OS/161 crashes during this early phase, it will exit without producing any output, like this:
@cpu20.student[104]% sys161 kernel
sys161: System/161 release 1.13, compiled May 19 2007 21:48:06
sys161: 213925 cycles (88758k, 0u, 125167i)
sys161: 0 irqs 0 exns 0r/0w disk 0r/0w console 0r/0w/1m emufs 0r/0w net
sys161: Elapsed real time: 0.017304 seconds (12.3627 mhz)
sys161: Elapsed virtual time: 0.008557000 seconds (25 mhz)
@cpu20.student[105]% 
In the example above, the only output is from the simulator program (sys161), which reports some execution statistics when the simulated machine shuts down. All of the usual boot messages from the OS/161 kernel are missing.

It is also true that the kernel creates and uses locks very early in the boot process. Thus, if you start working on locks and you find that the kernel starts failing without producing any output, as shown above, a likely explanation is that your lock code is responsible for an early crash of the kernel. Note that you cannot debug this problem by adding kprintfs to your kernel, because the failure is happening before the kernel can display the output from your kprintf. (In addition, kprintf actually uses a lock, so adding more kprintfs may affect the problem you are trying to debug, or may cause new problems.)

You need to use gdb to help you determine where the problem lies. A good thing to start with is to set breakpoints at your lock functions, and step through them when they are called to ensure that they are behaving as you expect. Another useful idea is to set a breakpoint at the panic function, which the kernel will invoke if an assertion fails.


GDB tips

If you type < ctrl-g > in the sys161 window, the OS pauses and waits for a debugger to attach. This is useful if you don't have a debugger already attached and you want to see what's happening.

If you create a file called .gdbinit and put it in $HOME/cs350-os161/root directory (assuming that is where you are running cs350-gdb from) gdb will load commands from that file when it starts up. Here's an example of a simple, useful .gdbinit file:

dir ../os161-1.11/kern/compile/ASST1
target remote unix:.sockets/gdb
break panic
This way, you don't have to type the target command to connect to sys161 every time you run the debugger. This .gdbinit also sets a breakpoint at the kernel's panic function, so that gdb will grab control in case the kernel panics.

Below is an example gdb session to show what output to expect and how to up up the stack trace to look at a simple variable. This session was produced by Tim Brecht. His comments are on the lines that start with "//". Each comment refers to the output that follows it.

// Start cs350-gdb.
% cs350-gdb kernel
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=mips-elf"...
__start () at ../../arch/mips/mips/start.S:24
24         addiu sp, sp, -20
Current language:  auto; currently asm
Breakpoint 1 at 0x80010948: file ../../lib/kprintf.c, line 111.

// Continue execution. This stops and waits for input into os161.
// In the window running sys161 -w kernel after I type something like
// p testbin/program it runs and eventually panics.
(gdb) continue
Continuing.

Breakpoint 1, panic (fmt=0x8002fe6c "Assertion failed: %s, at %s:%d (%s)\n") at ../../lib/kprintf.c:111
111             if (evil==0) {
Current language:  auto; currently c

// I want to find out who called panic and why.
// Where shows the stack trace of who called what and all
// the parameters.
// panic is where we are currently.
// It was called from lock_destroy at line 181 of thread/synch.c
// It was called from proc_entry_init at line 601 of userprog/syscalls.c
// etc.
// proc_entry_init was the function that called lock_destroy.
(gdb) where
#0  panic (fmt=0x8002fe6c "Assertion failed: %s, at %s:%d (%s)\n") at ../../lib/kprintf.c:111
#1  0xffffffff8001740c in lock_destroy (lock=0x800c3e20) at ../../thread/synch.c:181
#2  0xffffffff80022b30 in proc_entry_init (pid=2) at ../../userprog/syscalls.c:601
#3  0xffffffff800258dc in sys_waitpid (pid=2, status=0x7fffff90, options=0, retval=0x800fcf18)
    at ../../userprog/syscalls.c:1367
#4  0xffffffff8000c6b8 in mips_syscall (tf=0x800fcf6c) at ../../arch/mips/mips/syscall.c:145
#5  0xffffffff8000cb68 in mips_trap (tf=0x800fcf6c) at ../../arch/mips/mips/trap.c:134

// Move up the stack frame.
(gdb) up
#1  0xffffffff8001740c in lock_destroy (lock=0x800c3e20) at ../../thread/synch.c:181
181             assert(lock->held == 0);

// Move up the stack frame.
(gdb) up
#2  0xffffffff80022b30 in proc_entry_init (pid=2) at ../../userprog/syscalls.c:601
601                     lock_destroy(p->ft_lock);

// Move up the stack frame.
(gdb) up
#3  0xffffffff800258dc in sys_waitpid (pid=2, status=0x7fffff90, options=0, retval=0x800fcf18)
    at ../../userprog/syscalls.c:1367
1367              proc_entry_init(pid);

// Print out some info about the current thread.
(gdb) print *curthread
$1 = {t_pcb = {pcb_switchstack = 2148519408, pcb_kstack = 2148519936, pcb_ininterrupt = 0, pcb_badfaultfunc = 0,
    pcb_copyjmp = {2148517880, 2147537544, 0, 0, 0, 0, 0, 0, 0, 0, 2148517880}},
  t_name = 0x800d2d00 "ourtests/conc-io", t_sleepaddr = 0x0, t_stack = 0x800fc000 "o?=\021o?=3", t_vmspace = 0x800f7360,
  t_cwd = 0x800d2f40, pid = 1}

// Print out the value of pid as it is inside of userprog/syscalls.c in the
// function sys_waitpid at the time of the call to proc_entry_init.
(gdb) print pid
$2 = 2

// Print out the value of pid in hex.
(gdb) print/x pid
$2 = 0x2

// x/i Examine (x) specified memory location as an instruction (i).
// Print out the instruction located at address 0xffffffff8000c6b8.
// Note that I got this address from the above stack trace.
// It shows you that this is address is 520 bytes inside of mips_syscall.
// This can be really helpful if OS/161 dies and sys161 just prints
// out the epc (which contains the address of the instruction
// that generated the execption.
(gdb) x/i 0xffffffff8000c6b8
0x8000c6b8 :  j       0x8000c6f0 

// x/i Examine (x) specified memory location as an instruction (i).
// Print out the instruction located at address 0xffffffff8001740c.
// Note that I got this address from the above stack trace.
// It shows you that this address is 148 bytes inside of lock_destroy.
(gdb) x/i 0xffffffff8001740c
0x8001740c :  lw      v0,32(s8)