+Xv6, a simple Unix-like teaching operating system
+
+
+
+
Xv6, a simple Unix-like teaching operating system
+
+Xv6 is a teaching operating system developed
+in the summer of 2006 for MIT's operating systems course,
+“6.828: Operating Systems Engineering.”
+We used it for 6.828 in Fall 2006 and Fall 2007
+and are using it this semester (Fall 2008).
+We hope that xv6 will be useful in other courses too.
+This page collects resources to aid the use of xv6
+in other courses.
+
+
History and Background
+For many years, MIT had no operating systems course.
+In the fall of 2002, Frans Kaashoek, Josh Cates, and Emil Sit
+created a new, experimental course (6.097)
+to teach operating systems engineering.
+In the course lectures, the class worked through Sixth Edition Unix (aka V6)
+using John Lions's famous commentary.
+In the lab assignments, students wrote most of an exokernel operating
+system, eventually named Jos, for the Intel x86.
+Exposing students to multiple systems–V6 and Jos–helped
+develop a sense of the spectrum of operating system designs.
+In the fall of 2003, the experimental 6.097 became the
+official course 6.828; the course has been offered each fall since then.
+
+V6 presented pedagogic challenges from the start.
+Students doubted the relevance of an obsolete 30-year-old operating system
+written in an obsolete programming language (pre-K&R C)
+running on obsolete hardware (the PDP-11).
+Students also struggled to learn the low-level details of two different
+architectures (the PDP-11 and the Intel x86) at the same time.
+By the summer of 2006, we had decided to replace V6
+with a new operating system, xv6, modeled on V6
+but written in ANSI C and running on multiprocessor
+Intel x86 machines.
+Xv6's use of the x86 makes it more relevant to
+students' experience than V6 was
+and unifies the course around a single architecture.
+Adding multiprocessor support also helps relevance
+and makes it easier to discuss threads and concurrency.
+(In a single processor operating system, concurrency–which only
+happens because of interrupts–is too easy to view as a special case.
+A multiprocessor operating system must attack the problem head on.)
+Finally, writing a new system allowed us to write cleaner versions
+of the rougher parts of V6, like the scheduler and file system.
+
+6.828 substituted xv6 for V6 in the fall of 2006.
+Based on that experience, we cleaned up rough patches
+of xv6 for the course in the fall of 2007.
+Since then, xv6 has stabilized, so we are making it
+available in the hopes that others will find it useful too.
+
+6.828 uses both xv6 and Jos.
+Courses taught at UCLA, NYU, and Stanford have used
+Jos without xv6; we believe other courses could use
+xv6 without Jos, though we are not aware of any that have.
+
+
Xv6 sources
+The latest xv6 is xv6-rev2.tar.gz.
+We distribute the sources in electronic form but also as
+a printed booklet with line numbers that keep everyone
+together during lectures. The booklet is available as
+xv6-rev2.pdf.
+
+xv6 compiles using the GNU C compiler,
+targeted at the x86 using ELF binaries.
+On BSD and Linux systems, you can use the native compilers;
+On OS X, which doesn't use ELF binaries,
+you must use a cross-compiler.
+Xv6 does boot on real hardware, but typically
+we run it using the Bochs emulator.
+Both the GCC cross compiler and Bochs
+can be found on the 6.828 tools page.
+
+
Lectures
+In 6.828, the lectures in the first half of the course
+introduce the PC hardware, the Intel x86, and then xv6.
+The lectures in the second half consider advanced topics
+using research papers; for some, xv6 serves as a useful
+base for making discussions concrete.
+This section describe a typical 6.828 lecture schedule,
+linking to lecture notes and homework.
+A course using only xv6 (not Jos) will need to adapt
+a few of the lectures, but we hope these are a useful
+starting point.
+
+
Lecture 1. Operating systems
+
+The first lecture introduces both the general topic of
+operating systems and the specific approach of 6.828.
+After defining “operating system,” the lecture
+examines the implementation of a Unix shell
+to look at the details the traditional Unix system call interface.
+This is relevant to both xv6 and Jos: in the final
+Jos labs, students implement a Unix-like interface
+and culminating in a Unix shell.
+
+This lecture introduces the PC architecture, the 16- and 32-bit x86,
+the stack, and the GCC x86 calling conventions.
+It also introduces the pieces of a typical C tool chain–compiler,
+assembler, linker, loader–and the Bochs emulator.
+
+This lecture continues Lecture 1's discussion of what
+an operating system does.
+An operating system provides a “virtual computer”
+interface to user space programs.
+At a high level, the main job of the operating system
+is to implement that interface
+using the physical computer it runs on.
+
+The lecture discusses four approaches to that job:
+monolithic operating systems, microkernels,
+virtual machines, and exokernels.
+Exokernels might not be worth mentioning
+except that the Jos labs are built around one.
+
+Reading: Engler et al., Exokernel: An Operating System Architecture
+for Application-Level Resource Management
+
+This is the first lecture that uses xv6.
+It introduces the idea of address spaces and the
+details of the x86 segmentation hardware.
+It makes the discussion concrete by reading the xv6
+source code and watching xv6 execute using the Bochs simulator.
+
+This lecture continues the discussion of address spaces,
+examining the other x86 virtual memory mechanism: page tables.
+Xv6 does not use page tables, so there is no xv6 here.
+Instead, the lecture uses Jos as a concrete example.
+An xv6-only course might skip or shorten this discussion.
+
+Reading: x86 manual excerpts
+
+Homework: stuff about gdt
+XXX not appropriate; should be in Lecture 4
+
+How does a user program invoke the operating system kernel?
+How does the kernel return to the user program?
+What happens when a hardware device needs attention?
+This lecture explains the answer to these questions:
+interrupt and exception handling.
+
+It explains the x86 trap setup mechanisms and then
+examines their use in xv6's SETGATE (mmu.h),
+tvinit (trap.c), idtinit (trap.c), vectors.pl, and vectors.S.
+
+It then traces through a call to the system call open:
+init.c, usys.S, vector48 and alltraps (vectors.S), trap (trap.c),
+syscall (syscall.c),
+sys_open (sysfile.c), fetcharg, fetchint, argint, argptr, argstr (syscall.c),
+
+The interrupt controller, briefly:
+pic_init and pic_enable (picirq.c).
+The timer and keyboard, briefly:
+timer_init (timer.c), console_init (console.c).
+Enabling and disabling of interrupts.
+
+This lecture introduces the problems of
+coordination and synchronization on a
+multiprocessor
+and then the solution of mutual exclusion locks.
+Atomic instructions, test-and-set locks,
+lock granularity, (the mistake of) recursive locks.
+
+Although xv6 user programs cannot share memory,
+the xv6 kernel itself is a program with multiple threads
+executing concurrently and sharing memory.
+Illustration: the xv6 scheduler's proc_table_lock (proc.c)
+and the spin lock implementation (spinlock.c).
+
Lecture 8. Threads, processes and context switching
+
+The last lecture introduced some of the issues
+in writing threaded programs, using xv6's processes
+as an example.
+This lecture introduces the issues in implementing
+threads, continuing to use xv6 as the example.
+
+The lecture defines a thread of computation as a register
+set and a stack. A process is an address space plus one
+or more threads of computation sharing that address space.
+Thus the xv6 kernel can be viewed as a single process
+with many threads (each user process) executing concurrently.
+
+This lecture introduces the idea of sequence coordination
+and then examines the particular solution illustrated by
+sleep and wakeup (proc.c).
+It introduces and refines a simple
+producer/consumer queue to illustrate the
+need for sleep and wakeup
+and then the sleep and wakeup
+implementations themselves.
+
+Homework: Explain how sleep and wakeup would break
+without proc_table_lock. Explain how devices would break
+without second lock argument to sleep.
+
+This is the first of three file system lectures.
+This lecture introduces the basic file system interface
+and then considers the on-disk layout of individual files
+and the free block bitmap.
+
+Reading: iread, iwrite, fileread, filewrite, wdir, mknod1, and
+ code related to these calls in fs.c, bio.c, ide.c, and file.c.
+
+Homework: Add print to bwrite to trace every disk write.
+Explain the disk writes caused by some simple shell commands.
+
+The last lecture discussed on-disk file system representation.
+This lecture covers the implementation of
+file system paths (namei in fs.c)
+and also discusses the security problems of a shared /tmp
+and symbolic links.
+
+Understanding exec (exec.c) is left as an exercise.
+
+This lecture is the first of the research paper-based lectures.
+It discusses the “soft updates” paper,
+using xv6 as a concrete example.
+
+
Feedback
+If you are interested in using xv6 or have used xv6 in a course,
+we would love to hear from you.
+If there's anything that we can do to make xv6 easier
+to adopt, we'd like to hear about it.
+We'd also be interested to hear what worked well and what didn't.
+
+You can reach all of us at 6.828-staff@pdos.csail.mit.edu.
+
+
+
+
diff --git a/web/index.txt b/web/index.txt
new file mode 100644
index 0000000..41d42a4
--- /dev/null
+++ b/web/index.txt
@@ -0,0 +1,335 @@
+** Xv6, a simple Unix-like teaching operating system
+Xv6 is a teaching operating system developed
+in the summer of 2006 for MIT's operating systems course,
+``6.828: Operating Systems Engineering.''
+We used it for 6.828 in Fall 2006 and Fall 2007
+and are using it this semester (Fall 2008).
+We hope that xv6 will be useful in other courses too.
+This page collects resources to aid the use of xv6
+in other courses.
+
+* History and Background
+
+For many years, MIT had no operating systems course.
+In the fall of 2002, Frans Kaashoek, Josh Cates, and Emil Sit
+created a new, experimental course (6.097)
+to teach operating systems engineering.
+In the course lectures, the class worked through Sixth Edition Unix (aka V6)
+using John Lions's famous commentary.
+In the lab assignments, students wrote most of an exokernel operating
+system, eventually named Jos, for the Intel x86.
+Exposing students to multiple systems--V6 and Jos--helped
+develop a sense of the spectrum of operating system designs.
+In the fall of 2003, the experimental 6.097 became the
+official course 6.828; the course has been offered each fall since then.
+
+V6 presented pedagogic challenges from the start.
+Students doubted the relevance of an obsolete 30-year-old operating system
+written in an obsolete programming language (pre-K&R C)
+running on obsolete hardware (the PDP-11).
+Students also struggled to learn the low-level details of two different
+architectures (the PDP-11 and the Intel x86) at the same time.
+By the summer of 2006, we had decided to replace V6
+with a new operating system, xv6, modeled on V6
+but written in ANSI C and running on multiprocessor
+Intel x86 machines.
+Xv6's use of the x86 makes it more relevant to
+students' experience than V6 was
+and unifies the course around a single architecture.
+Adding multiprocessor support also helps relevance
+and makes it easier to discuss threads and concurrency.
+(In a single processor operating system, concurrency--which only
+happens because of interrupts--is too easy to view as a special case.
+A multiprocessor operating system must attack the problem head on.)
+Finally, writing a new system allowed us to write cleaner versions
+of the rougher parts of V6, like the scheduler and file system.
+
+6.828 substituted xv6 for V6 in the fall of 2006.
+Based on that experience, we cleaned up rough patches
+of xv6 for the course in the fall of 2007.
+Since then, xv6 has stabilized, so we are making it
+available in the hopes that others will find it useful too.
+
+6.828 uses both xv6 and Jos.
+Courses taught at UCLA, NYU, and Stanford have used
+Jos without xv6; we believe other courses could use
+xv6 without Jos, though we are not aware of any that have.
+
+
+* Xv6 sources
+
+The latest xv6 is [xv6-rev2.tar.gz].
+We distribute the sources in electronic form but also as
+a printed booklet with line numbers that keep everyone
+together during lectures. The booklet is available as
+[xv6-rev2.pdf].
+
+xv6 compiles using the GNU C compiler,
+targeted at the x86 using ELF binaries.
+On BSD and Linux systems, you can use the native compilers;
+On OS X, which doesn't use ELF binaries,
+you must use a cross-compiler.
+Xv6 does boot on real hardware, but typically
+we run it using the Bochs emulator.
+Both the GCC cross compiler and Bochs
+can be found on the [../../2007/tools.html | 6.828 tools page].
+
+
+* Lectures
+
+In 6.828, the lectures in the first half of the course
+introduce the PC hardware, the Intel x86, and then xv6.
+The lectures in the second half consider advanced topics
+using research papers; for some, xv6 serves as a useful
+base for making discussions concrete.
+This section describe a typical 6.828 lecture schedule,
+linking to lecture notes and homework.
+A course using only xv6 (not Jos) will need to adapt
+a few of the lectures, but we hope these are a useful
+starting point.
+
+
+Lecture 1. Operating systems
+
+The first lecture introduces both the general topic of
+operating systems and the specific approach of 6.828.
+After defining ``operating system,'' the lecture
+examines the implementation of a Unix shell
+to look at the details the traditional Unix system call interface.
+This is relevant to both xv6 and Jos: in the final
+Jos labs, students implement a Unix-like interface
+and culminating in a Unix shell.
+
+[l1.html | lecture notes]
+
+
+Lecture 2. PC hardware and x86 programming
+
+This lecture introduces the PC architecture, the 16- and 32-bit x86,
+the stack, and the GCC x86 calling conventions.
+It also introduces the pieces of a typical C tool chain--compiler,
+assembler, linker, loader--and the Bochs emulator.
+
+Reading: PC Assembly Language
+
+Homework: familiarize with Bochs
+
+[l2.html | lecture notes]
+[x86-intro.html | homework]
+
+
+Lecture 3. Operating system organization
+
+This lecture continues Lecture 1's discussion of what
+an operating system does.
+An operating system provides a ``virtual computer''
+interface to user space programs.
+At a high level, the main job of the operating system
+is to implement that interface
+using the physical computer it runs on.
+
+The lecture discusses four approaches to that job:
+monolithic operating systems, microkernels,
+virtual machines, and exokernels.
+Exokernels might not be worth mentioning
+except that the Jos labs are built around one.
+
+Reading: Engler et al., Exokernel: An Operating System Architecture
+for Application-Level Resource Management
+
+[l3.html | lecture notes]
+
+
+Lecture 4. Address spaces using segmentation
+
+This is the first lecture that uses xv6.
+It introduces the idea of address spaces and the
+details of the x86 segmentation hardware.
+It makes the discussion concrete by reading the xv6
+source code and watching xv6 execute using the Bochs simulator.
+
+Reading: x86 MMU handout,
+xv6: bootasm.S, bootother.S, bootmain.c, main.c, init.c, and setupsegs in proc.c.
+
+Homework: Bochs stack introduction
+
+[l4.html | lecture notes]
+[xv6-intro.html | homework]
+
+
+Lecture 5. Address spaces using page tables
+
+This lecture continues the discussion of address spaces,
+examining the other x86 virtual memory mechanism: page tables.
+Xv6 does not use page tables, so there is no xv6 here.
+Instead, the lecture uses Jos as a concrete example.
+An xv6-only course might skip or shorten this discussion.
+
+Reading: x86 manual excerpts
+
+Homework: stuff about gdt
+XXX not appropriate; should be in Lecture 4
+
+[l5.html | lecture notes]
+
+
+Lecture 6. Interrupts and exceptions
+
+How does a user program invoke the operating system kernel?
+How does the kernel return to the user program?
+What happens when a hardware device needs attention?
+This lecture explains the answer to these questions:
+interrupt and exception handling.
+
+It explains the x86 trap setup mechanisms and then
+examines their use in xv6's SETGATE (mmu.h),
+tvinit (trap.c), idtinit (trap.c), vectors.pl, and vectors.S.
+
+It then traces through a call to the system call open:
+init.c, usys.S, vector48 and alltraps (vectors.S), trap (trap.c),
+syscall (syscall.c),
+sys_open (sysfile.c), fetcharg, fetchint, argint, argptr, argstr (syscall.c),
+
+The interrupt controller, briefly:
+pic_init and pic_enable (picirq.c).
+The timer and keyboard, briefly:
+timer_init (timer.c), console_init (console.c).
+Enabling and disabling of interrupts.
+
+Reading: x86 manual excerpts,
+xv6: trapasm.S, trap.c, syscall.c, and usys.S.
+Skim lapic.c, ioapic.c, picirq.c.
+
+Homework: Explain the 35 words on the top of the
+stack at first invocation of syscall.
+
+[l-interrupt.html | lecture notes]
+[x86-intr.html | homework]
+
+
+Lecture 7. Multiprocessors and locking
+
+This lecture introduces the problems of
+coordination and synchronization on a
+multiprocessor
+and then the solution of mutual exclusion locks.
+Atomic instructions, test-and-set locks,
+lock granularity, (the mistake of) recursive locks.
+
+Although xv6 user programs cannot share memory,
+the xv6 kernel itself is a program with multiple threads
+executing concurrently and sharing memory.
+Illustration: the xv6 scheduler's proc_table_lock (proc.c)
+and the spin lock implementation (spinlock.c).
+
+Reading: xv6: spinlock.c. Skim mp.c.
+
+Homework: Interaction between locking and interrupts.
+Try not disabling interrupts in the disk driver and watch xv6 break.
+
+[l-lock.html | lecture notes]
+[xv6-lock.html | homework]
+
+
+Lecture 8. Threads, processes and context switching
+
+The last lecture introduced some of the issues
+in writing threaded programs, using xv6's processes
+as an example.
+This lecture introduces the issues in implementing
+threads, continuing to use xv6 as the example.
+
+The lecture defines a thread of computation as a register
+set and a stack. A process is an address space plus one
+or more threads of computation sharing that address space.
+Thus the xv6 kernel can be viewed as a single process
+with many threads (each user process) executing concurrently.
+
+Illustrations: thread switching (swtch.S), scheduler (proc.c), sys_fork (sysproc.c)
+
+Reading: proc.c, swtch.S, sys_fork (sysproc.c)
+
+Homework: trace through stack switching.
+
+[l-threads.html | lecture notes (need to be updated to use swtch)]
+[xv6-sched.html | homework]
+
+
+Lecture 9. Processes and coordination
+
+This lecture introduces the idea of sequence coordination
+and then examines the particular solution illustrated by
+sleep and wakeup (proc.c).
+It introduces and refines a simple
+producer/consumer queue to illustrate the
+need for sleep and wakeup
+and then the sleep and wakeup
+implementations themselves.
+
+Reading: proc.c, sys_exec, sys_sbrk, sys_wait, sys_exec, sys_kill (sysproc.c).
+
+Homework: Explain how sleep and wakeup would break
+without proc_table_lock. Explain how devices would break
+without second lock argument to sleep.
+
+[l-coordination.html | lecture notes]
+[xv6-sleep.html | homework]
+
+
+Lecture 10. Files and disk I/O
+
+This is the first of three file system lectures.
+This lecture introduces the basic file system interface
+and then considers the on-disk layout of individual files
+and the free block bitmap.
+
+Reading: iread, iwrite, fileread, filewrite, wdir, mknod1, and
+ code related to these calls in fs.c, bio.c, ide.c, and file.c.
+
+Homework: Add print to bwrite to trace every disk write.
+Explain the disk writes caused by some simple shell commands.
+
+[l-fs.html | lecture notes]
+[xv6-disk.html | homework]
+
+
+Lecture 11. Naming
+
+The last lecture discussed on-disk file system representation.
+This lecture covers the implementation of
+file system paths (namei in fs.c)
+and also discusses the security problems of a shared /tmp
+and symbolic links.
+
+Understanding exec (exec.c) is left as an exercise.
+
+Reading: namei in fs.c, sysfile.c, file.c.
+
+Homework: Explain how to implement symbolic links in xv6.
+
+[l-name.html | lecture notes]
+[xv6-names.html | homework]
+
+
+Lecture 12. High-performance file systems
+
+This lecture is the first of the research paper-based lectures.
+It discusses the ``soft updates'' paper,
+using xv6 as a concrete example.
+
+
+* Feedback
+
+If you are interested in using xv6 or have used xv6 in a course,
+we would love to hear from you.
+If there's anything that we can do to make xv6 easier
+to adopt, we'd like to hear about it.
+We'd also be interested to hear what worked well and what didn't.
+
+Russ Cox (rsc@swtch.com)
+Frans Kaashoek (kaashoek@mit.edu)
+Robert Morris (rtm@mit.edu)
+
+You can reach all of us at 6.828-staff@pdos.csail.mit.edu.
+
+
diff --git a/web/l-bugs.html b/web/l-bugs.html
new file mode 100644
index 0000000..493372d
--- /dev/null
+++ b/web/l-bugs.html
@@ -0,0 +1,187 @@
+OS Bugs
+
+
+
+
+
+
OS Bugs
+
+
Required reading: Bugs as deviant behavior
+
+
Overview
+
+
Operating systems must obey many rules for correctness and
+performance. Examples rules:
+
+
Do not call blocking functions with interrupts disabled or spin
+lock held
+
check for NULL results
+
Do not allocate large stack variables
+
Do no re-use already-allocated memory
+
Check user pointers before using them in kernel mode
+
Release acquired locks
+
+
+
In addition, there are standard software engineering rules, like
+use function results in consistent ways.
+
+
These rules are typically not checked by a compiler, even though
+they could be checked by a compiler, in principle. The goal of the
+meta-level compilation project is to allow system implementors to
+write system-specific compiler extensions that check the source code
+for rule violations.
+
+
The results are good: many new bugs found (500-1000) in Linux
+alone. The paper for today studies these bugs and attempts to draw
+lessons from these bugs.
+
+
Are kernel error worse than user-level errors? That is, if we get
+the kernel correct, then we won't have system crashes?
+
+
Errors in JOS kernel
+
+
What are unstated invariants in the JOS?
+
+
Interrupts are disabled in kernel mode
+
Only env 1 has access to disk
+
All registers are saved & restored on context switch
+
Application code is never executed with CPL 0
+
Don't allocate an already-allocated physical page
+
Propagate error messages to user applications (e.g., out of
+resources)
+
Map pipe before fd
+
Unmap fd before pipe
+
A spawned program should have open only file descriptors 0, 1, and 2.
+
Pass sometimes size in bytes and sometimes in block number to a
+given file system function.
+
User pointers should be run through TRUP before used by the kernel
+
+
+
Could these errors have been caught by metacompilation? Would
+metacompilation have caught the pipe race condition? (Probably not,
+it happens in only one place.)
+
+
How confident are you that your code is correct? For example,
+are you sure interrupts are always disabled in kernel mode? How would
+you test?
+
+
Metacompilation
+
+
A system programmer writes the rule checkers in a high-level,
+state-machine language (metal). These checkers are dynamically linked
+into an extensible version of g++, xg++. Xg++ applies the rule
+checkers to every possible execution path of a function that is being
+compiled.
+
+
Some checkers produce false positives, because of limitations of
+both static analysis and the checkers, which mostly use local
+analysis.
+
+
How does the block checker work? The first pass is a rule
+that marks functions as potentially blocking. After processing a
+function, the checker emits the function's flow graph to a file
+(including, annotations and functions called). The second pass takes
+the merged flow graph of all function calls, and produces a file with
+all functions that have a path in the control-flow-graph to a blocking
+function call. For the Linux kernel this results in 3,000 functions
+that potentially could call sleep. Yet another checker like
+check_interrupts checks if a function calls any of the 3,000 functions
+with interrupts disabled. Etc.
+
+
This paper
+
+
Writing rules is painful. First, you have to write them. Second,
+how do you decide what to check? Was it easy to enumerate all
+conventions for JOS?
+
+
Insight: infer programmer "beliefs" from code and cross-check
+for contradictions. If cli is always followed by sti,
+except in one case, perhaps something is wrong. This simplifies
+life because we can write generic checkers instead of checkers
+that specifically check for sti, and perhaps we get lucky
+and find other temporal ordering conventions.
+
+
Do we know which case is wrong? The 999 times or the 1 time that
+sti is absent? (No, this method cannot figure what the correct
+sequence is but it can flag that something is weird, which in practice
+useful.) The method just detects inconsistencies.
+
+
Is every inconsistency an error? No, some inconsistency don't
+indicate an error. If a call to function f is often followed
+by call to function g, does that imply that f should always be
+followed by g? (No!)
+
+
Solution: MUST beliefs and MAYBE beliefs. MUST beliefs are
+invariants that must hold; any inconsistency indicates an error. If a
+pointer is dereferences, then the programmer MUST believe that the
+pointer is pointing to something that can be dereferenced (i.e., the
+pointer is definitely not zero). MUST beliefs can be checked using
+"internal inconsistencies".
+
+
An aside, can zero pointers pointers be detected during runtime?
+(Sure, unmap the page at address zero.) Why is metacompilation still
+valuable? (At runtime you will find only the null pointers that your
+test code dereferenced; not all possible dereferences of null
+pointers.) An even more convincing example for Metacompilation is
+tracking user pointers that the kernel dereferences. (Is this a MUST
+belief?)
+
+
MAYBE beliefs are invariants that are suggested by the code, but
+they maybe coincidences. MAYBE beliefs are ranked by statistical
+analysis, and perhaps augmented with input about functions names
+(e.g., alloc and free are important). Is it computationally feasible
+to check every MAYBE belief? Could there be much noise?
+
+
What errors won't this approach catch?
+
+
Paper discussion
+
+
This paper is best discussed by studying every code fragment. Most
+code fragments are pieces of code from Linux distributions; these
+mistakes are real!
+
+
Section 3.1. what is the error? how does metacompilation catch
+it?
+
+
Figure 1. what is the error? is there one?
+
+
Code fragments from 6.1. what is the error? how does metacompilation catch
+it?
+
+
Figure 3. what is the error? how does metacompilation catch
+it?
+
+
Section 8.3. what is the error? how does metacompilation catch
+it?
+
+
+
diff --git a/web/l-coordination.html b/web/l-coordination.html
new file mode 100644
index 0000000..b2f9f0d
--- /dev/null
+++ b/web/l-coordination.html
@@ -0,0 +1,354 @@
+
L9
+
+
+
+
+
+
Coordination and more processes
+
+
Required reading: remainder of proc.c, sys_exec, sys_sbrk,
+ sys_wait, sys_exit, and sys_kill.
+
+
Overview
+
+
Big picture: more programs than processors. How to share the
+ limited number of processors among the programs? Last lecture
+ covered basic mechanism: threads and the distinction between process
+ and thread. Today expand: how to coordinate the interactions
+ between threads explicitly, and some operations on processes.
+
+
Sequence coordination. This is a diferrent type of coordination
+ than mutual-exclusion coordination (which has its goal to make
+ atomic actions so that threads don't interfere). The goal of
+ sequence coordination is for threads to coordinate the sequences in
+ which they run.
+
+
For example, a thread may want to wait until another thread
+ terminates. One way to do so is to have the thread run periodically,
+ let it check if the other thread terminated, and if not give up the
+ processor again. This is wasteful, especially if there are many
+ threads.
+
+
With primitives for sequence coordination one can do better. The
+ thread could tell the thread manager that it is waiting for an event
+ (e.g., another thread terminating). When the other thread
+ terminates, it explicitly wakes up the waiting thread. This is more
+ work for the programmer, but more efficient.
+
+
Sequence coordination often interacts with mutual-exclusion
+ coordination, as we will see below.
+
+
The operating system literature has a rich set of primivites for
+ sequence coordination. We study a very simple version of condition
+ variables in xv6: sleep and wakeup, with a single lock.
+
+
xv6 code examples
+
+
Sleep and wakeup - usage
+
+Let's consider implementing a producer/consumer queue
+(like a pipe) that can be used to hold a single non-null char pointer:
+
+
Easy and correct, at least assuming there is at most one
+reader and at most one writer at a time.
+
+
Unfortunately, the while loops are inefficient.
+Instead of polling, it would be great if there were
+primitives saying ``wait for some event to happen''
+and ``this event happened''.
+That's what sleep and wakeup do.
+
+
+
+This is okay, and now safer for multiple readers and writers,
+except that wakeup wakes up everyone who is asleep on chan,
+not just one guy.
+So some of the guys who wake up from sleep might not
+be cleared to read or write from the queue. Have to go back to looping:
+
+
The problem is that now we're using lk to protect
+access to the p->chan and p->state variables
+but other routines besides sleep and wakeup
+(in particular, proc_kill) will need to use them and won't
+know which lock protects them.
+So instead of protecting them with lk, let's use proc_table_lock:
+
+
One could probably make things work with lk as above,
+but the relationship between data and locks would be
+more complicated with no real benefit. Xv6 takes the easy way out
+and says that elements in the proc structure are always protected
+by proc_table_lock.
+
+
Use example: exit and wait
+
+
If proc_wait decides there are children to be waited for,
+it calls sleep at line 2462.
+When a process exits, we proc_exit scans the process table
+to find the parent and wakes it at 2408.
+
+
Which lock protects sleep and wakeup from missing each other?
+Proc_table_lock. Have to tweak sleep again to avoid double-acquire:
+
+
Proc_kill marks a process as killed (line 2371).
+When the process finally exits the kernel to user space,
+or if a clock interrupt happens while it is in user space,
+it will be destroyed (line 2886, 2890, 2912).
+
+
Why wait until the process ends up in user space?
+
+
What if the process is stuck in sleep? It might take a long
+time to get back to user space.
+Don't want to have to wait for it, so make sleep wake up early
+(line 2373).
+
+
This means all callers of sleep should check
+whether they have been killed, but none do.
+Bug in xv6.
+
+
System call handlers
+
+
Sheet 32
+
+
Fork: discussed copyproc in earlier lectures.
+Sys_fork (line 3218) just calls copyproc
+and marks the new proc runnable.
+Does fork create a new process or a new thread?
+Is there any shared context?
+
+
Exec: we'll talk about exec later, when we talk about file systems.
+
+
Sbrk: Saw growproc earlier. Why setupsegs before returning?
diff --git a/web/l-fs.html b/web/l-fs.html
new file mode 100644
index 0000000..ed911fc
--- /dev/null
+++ b/web/l-fs.html
@@ -0,0 +1,222 @@
+
L10
+
+
+
+
+
+
File systems
+
+
Required reading: iread, iwrite, and wdir, and code related to
+ these calls in fs.c, bio.c, ide.c, file.c, and sysfile.c
+
+
Overview
+
+
The next 3 lectures are about file systems:
+
+
Basic file system implementation
+
Naming
+
Performance
+
+
+
Users desire to store their data durable so that data survives when
+the user turns of his computer. The primary media for doing so are:
+magnetic disks, flash memory, and tapes. We focus on magnetic disks
+(e.g., through the IDE interface in xv6).
+
+
To allow users to remember where they stored a file, they can
+assign a symbolic name to a file, which appears in a directory.
+
+
The data in a file can be organized in a structured way or not.
+The structured variant is often called a database. UNIX uses the
+unstructured variant: files are streams of bytes. Any particular
+structure is likely to be useful to only a small class of
+applications, and other applications will have to work hard to fit
+their data into one of the pre-defined structures. Besides, if you
+want structure, you can easily write a user-mode library program that
+imposes that format on any file. The end-to-end argument in action.
+(Databases have special requirements and support an important class of
+applications, and thus have a specialized plan.)
+
+
The API for a minimal file system consists of: open, read, write,
+seek, close, and stat. Dup duplicates a file descriptor. For example:
+
Maintaining the file offset behind the read/write interface is an
+ interesting design decision . The alternative is that the state of a
+ read operation should be maintained by the process doing the reading
+ (i.e., that the pointer should be passed as an argument to read).
+ This argument is compelling in view of the UNIX fork() semantics,
+ which clones a process which shares the file descriptors of its
+ parent. A read by the parent of a shared file descriptor (e.g.,
+ stdin, changes the read pointer seen by the child). On the other
+ hand the alternative would make it difficult to get "(data; ls) > x"
+ right.
+
+
Unix API doesn't specify that the effects of write are immediately
+ on the disk before a write returns. It is up to the implementation
+ of the file system within certain bounds. Choices include (that
+ aren't non-exclusive):
+
+
At some point in the future, if the system stays up (e.g., after
+ 30 seconds);
+
Before the write returns;
+
Before close returns;
+
User specified (e.g., before fsync returns).
+
+
+
A design issue is the semantics of a file system operation that
+ requires multiple disk writes. In particular, what happens if the
+ logical update requires writing multiple disks blocks and the power
+ fails during the update? For example, to create a new file,
+ requires allocating an inode (which requires updating the list of
+ free inodes on disk), writing a directory entry to record the
+ allocated i-node under the name of the new file (which may require
+ allocating a new block and updating the directory inode). If the
+ power fails during the operation, the list of free inodes and blocks
+ may be inconsistent with the blocks and inodes in use. Again this is
+ up to implementation of the file system to keep on disk data
+ structures consistent:
+
+
Don't worry about it much, but use a recovery program to bring
+ file system back into a consistent state.
+
Journaling file system. Never let the file system get into an
+ inconsistent state.
+
+
+
Another design issue is the semantics are of concurrent writes to
+the same data item. What is the order of two updates that happen at
+the same time? For example, two processes open the same file and write
+to it. Modern Unix operating systems allow the application to lock a
+file to get exclusive access. If file locking is not used and if the
+file descriptor is shared, then the bytes of the two writes will get
+into the file in some order (this happens often for log files). If
+the file descriptor is not shared, the end result is not defined. For
+example, one write may overwrite the other one (e.g., if they are
+writing to the same part of the file.)
+
+
An implementation issue is performance, because writing to magnetic
+disk is relatively expensive compared to computing. Three primary ways
+to improve performance are: careful file system layout that induces
+few seeks, an in-memory cache of frequently-accessed blocks, and
+overlap I/O with computation so that file operations don't have to
+wait until their completion and so that that the disk driver has more
+data to write, which allows disk scheduling. (We will talk about
+performance in detail later.)
+
+
xv6 code examples
+
+
xv6 implements a minimal Unix file system interface. xv6 doesn't
+pay attention to file system layout. It overlaps computation and I/O,
+but doesn't do any disk scheduling. Its cache is write-through, which
+simplifies keep on disk datastructures consistent, but is bad for
+performance.
+
+
On disk files are represented by an inode (struct dinode in fs.h),
+and blocks. Small files have up to 12 block addresses in their inode;
+large files use files the last address in the inode as a disk address
+for a block with 128 disk addresses (512/4). The size of a file is
+thus limited to 12 * 512 + 128*512 bytes. What would you change to
+support larger files? (Ans: e.g., double indirect blocks.)
+
+
Directories are files with a bit of structure to them. The file
+contains of records of the type struct dirent. The entry contains the
+name for a file (or directory) and its corresponding inode number.
+How many files can appear in a directory?
+
+
In memory files are represented by struct inode in fsvar.h. What is
+the role of the additional fields in struct inode?
+
+
What is xv6's disk layout? How does xv6 keep track of free blocks
+ and inodes? See balloc()/bfree() and ialloc()/ifree(). Is this
+ layout a good one for performance? What are other options?
+
+
Let's assume that an application created an empty file x with
+ contains 512 bytes, and that the application now calls read(fd, buf,
+ 100), that is, it is requesting to read 100 bytes into buf.
+ Furthermore, let's assume that the inode for x is is i. Let's pick
+ up what happens by investigating readi(), line 4483.
+
+
4488-4492: can iread be called on other objects than files? (Yes.
+ For example, read from the keyboard.) Everything is a file in Unix.
+
4495: what does bmap do?
+
+
4384: what block is being read?
+
+
4483: what does bread do? does bread always cause a read to disk?
+
+
4006: what does bget do? it implements a simple cache of
+ recently-read disk blocks.
+
+
How big is the cache? (see param.h)
+
3972: look if the requested block is in the cache by walking down
+ a circular list.
+
3977: we had a match.
+
3979: some other process has "locked" the block, wait until it
+ releases. the other processes releases the block using brelse().
+Why lock a block?
+
+
Atomic read and update. For example, allocating an inode: read
+ block containing inode, mark it allocated, and write it back. This
+ operation must be atomic.
+
+
3982: it is ours now.
+
3987: it is not in the cache; we need to find a cache entry to
+ hold the block.
+
3987: what is the cache replacement strategy? (see also brelse())
+
3988: found an entry that we are going to use.
+
3989: mark it ours but don't mark it valid (there is no valid data
+ in the entry yet).
+
+
4007: if the block was in the cache and the entry has the block's
+ data, return.
+
4010: if the block wasn't in the cache, read it from disk. are
+ read's synchronous or asynchronous?
+
+
3836: a bounded buffer of outstanding disk requests.
+
3809: tell the disk to move arm and generate an interrupt.
+
3851: go to sleep and run some other process to run. time sharing
+ in action.
+
3792: interrupt: arm is in the right position; wakeup requester.
+
3856: read block from disk.
+
3860: remove request from bounded buffer. wakeup processes that
+ are waiting for a slot.
+
3864: start next disk request, if any. xv6 can overlap I/O with
+computation.
+
+
4011: mark the cache entry has holding the data.
+
+
4498: To where is the block copied? is dst a valid user address?
+
+
+
Now let's suppose that the process is writing 512 bytes at the end
+ of the file a. How many disk writes will happen?
+
+
4567: allocate a new block
+
+
4518: allocate a block: scan block map, and write entry
+
4523: How many disk operations if the process would have been appending
+ to a large file? (Answer: read indirect block, scan block map, write
+ block map.)
+
+
4572: read the block that the process will be writing, in case the
+ process writes only part of the block.
+
4574: write it. is it synchronous or asynchronous? (Ans:
+ synchronous but with timesharing.)
+
+
+
Lots of code to implement reading and writing of files. How about
+ directories?
+
+
4722: look for the directory, reading directory block and see if a
+ directory entry is unused (inum == 0).
+
4729: use it and update it.
+
4735: write the modified block.
+
+
Reading and writing of directories is trivial.
+
+
diff --git a/web/l-interrupt.html b/web/l-interrupt.html
new file mode 100644
index 0000000..363af5e
--- /dev/null
+++ b/web/l-interrupt.html
@@ -0,0 +1,174 @@
+
+
Lecture 6: Interrupts & Exceptions
+
+
+
Interrupts & Exceptions
+
+
+Required reading: xv6 trapasm.S, trap.c, syscall.c, usys.S.
+
+You will need to consult
+IA32 System
+Programming Guide chapter 5 (skip 5.7.1, 5.8.2, 5.12.2).
+
+
Overview
+
+
+Big picture: kernel is trusted third-party that runs the machine.
+Only the kernel can execute privileged instructions (e.g.,
+changing MMU state).
+The processor enforces this protection through the ring bits
+in the code segment.
+If a user application needs to carry out a privileged operation
+or other kernel-only service,
+it must ask the kernel nicely.
+How can a user program change to the kernel address space?
+How can the kernel transfer to a user address space?
+What happens when a device attached to the computer
+needs attention?
+These are the topics for today's lecture.
+
+
+There are three kinds of events that must be handled
+by the kernel, not user programs:
+(1) a system call invoked by a user program,
+(2) an illegal instruction or other kind of bad processor state (memory fault, etc.).
+and
+(3) an interrupt from a hardware device.
+
+
+Although these three events are different, they all use the same
+mechanism to transfer control to the kernel.
+This mechanism consists of three steps that execute as one atomic unit.
+(a) change the processor to kernel mode;
+(b) save the old processor somewhere (usually the kernel stack);
+and (c) change the processor state to the values set up as
+the “official kernel entry values.”
+The exact implementation of this mechanism differs
+from processor to processor, but the idea is the same.
+
+
+We'll work through examples of these today in lecture.
+You'll see all three in great detail in the labs as well.
+
+
+A note on terminology: sometimes we'll
+use interrupt (or trap) to mean both interrupts and exceptions.
+
+
+xv6 Sheet 28: tvinit and idtinit.
+Note setting of gate for T_SYSCALL
+
+
+xv6 Sheet 29: vectors.pl (also see generated vectors.S).
+
+
+System calls
+
+
+
+xv6 Sheet 16: init.c calls open("console").
+How is that implemented?
+
+
+xv6 usys.S (not in book).
+(No saving of registers. Why?)
+
+
+Breakpoint 0x1b:"open",
+step past int instruction into kernel.
+
+
+See handout Figure 9-4 [sic].
+
+
+xv6 Sheet 28: in vectors.S briefly, then in alltraps.
+Step through to call trap, examine registers and stack.
+How will the kernel find the argument to open?
+
+
+What happens if a user program divides by zero
+or accesses unmapped memory?
+Exception. Same path as system call until trap.
+
+
+What happens if kernel divides by zero or accesses unmapped memory?
+
+
+Interrupts
+
+
+
+Like system calls, except:
+devices generate them at any time,
+there are no arguments in CPU registers,
+nothing to return to,
+usually can't ignore them.
+
+
+How do they get generated?
+Device essentially phones up the
+interrupt controller and asks to talk to the CPU.
+Interrupt controller then buzzes the CPU and
+tells it, “keyboard on line 1.”
+Interrupt controller is essentially the CPU's
+secretary administrative assistant,
+managing the phone lines on the CPU's behalf.
+
+
+Have to set up interrupt controller.
+
+
+(Briefly) xv6 Sheet 63: pic_init sets up the interrupt controller,
+irq_enable tells the interrupt controller to let the given
+interrupt through.
+
+
+(Briefly) xv6 Sheet 68: pit8253_init sets up the clock chip,
+telling it to interrupt on IRQ_TIMER 100 times/second.
+console_init sets up the keyboard, enabling IRQ_KBD.
+
+
+In Bochs, set breakpoint at 0x8:"vector0"
+and continue, loading kernel.
+Step through clock interrupt, look at
+stack, registers.
+
+
+Was the processor executing in kernel or user mode
+at the time of the clock interrupt?
+Why? (Have any user-space instructions executed at all?)
+
+
+Can the kernel get an interrupt at any time?
+Why or why not? cli and sti,
+irq_enable.
+
+
+
diff --git a/web/l-lock.html b/web/l-lock.html
new file mode 100644
index 0000000..eea8217
--- /dev/null
+++ b/web/l-lock.html
@@ -0,0 +1,322 @@
+
L7
+
+
+
+
+
+
Locking
+
+
Required reading: spinlock.c
+
+
Why coordinate?
+
+
Mutual-exclusion coordination is an important topic in operating
+systems, because many operating systems run on
+multiprocessors. Coordination techniques protect variables that are
+shared among multiple threads and updated concurrently. These
+techniques allow programmers to implement atomic sections so that one
+thread can safely update the shared variables without having to worry
+that another thread intervening. For example, processes in xv6 may
+run concurrently on different processors and in kernel-mode share
+kernel data structures. We must ensure that these updates happen
+correctly.
+
+
List and insert example:
+
+
+struct List {
+ int data;
+ struct List *next;
+};
+
+List *list = 0;
+
+insert(int data) {
+ List *l = new List;
+ l->data = data;
+ l->next = list; // A
+ list = l; // B
+}
+
+
+
What needs to be atomic? The two statements labeled A and B should
+always be executed together, as an indivisible fragment of code. If
+two processors execute A and B interleaved, then we end up with an
+incorrect list. To see that this is the case, draw out the list after
+the sequence A1 (statement executed A by processor 1), A2 (statement A
+executed by processor 2), B2, and B1.
+
+
How could this erroneous sequence happen? The varilable list
+lives in physical memory shared among multiple processors, connected
+by a bus. The accesses to the shared memory will be ordered in some
+total order by the bus/memory system. If the programmer doesn't
+coordinate the execution of the statements A and B, any order can
+happen, including the erroneous one.
+
+
The erroneous case is called a race condition. The problem with
+races is that they are difficult to reproduce. For example, if you
+put print statements in to debug the incorrect behavior, you might
+change the time and the race might not happen anymore.
+
+
Atomic instructions
+
+
The programmer must be able express that A and B should be executed
+as single atomic instruction. We generally use a concept like locks
+to mark an atomic region, acquiring the lock at the beginning of the
+section and releasing it at the end:
+
+
Acquire and release, of course, need to be atomic too, which can,
+for example, be done with a hardware atomic TSL (try-set-lock)
+instruction:
+
+
The semantics of TSL are:
+
+ R <- [mem] // load content of mem into register R
+ [mem] <- 1 // store 1 in mem.
+
+
+
In a harware implementation, the bus arbiter guarantees that both
+the load and store are executed without any other load/stores coming
+in between.
+
+
We can use locks to implement an atomic insert, or we can use
+TSL directly:
+
It is the programmer's job to make sure that locks are respected. If
+a programmer writes another function that manipulates the list, the
+programmer must must make sure that the new functions acquires and
+releases the appropriate locks. If the programmer doesn't, race
+conditions occur.
+
+
This code assumes that stores commit to memory in program order and
+that all stores by other processors started before insert got the lock
+are observable by this processor. That is, after the other processor
+released a lock, all the previous stores are committed to memory. If
+a processor executes instructions out of order, this assumption won't
+hold and we must, for example, a barrier instruction that makes the
+assumption true.
+
+
+
Example: Locking on x86
+
+
Here is one way we can implement acquire and release using the x86
+xchgl instruction:
+
+
+struct Lock {
+ unsigned int locked;
+};
+
+acquire(Lock *lck) {
+ while(TSL(&(lck->locked)) != 0)
+ ;
+}
+
+release(Lock *lck) {
+ lck->locked = 0;
+}
+
+int
+TSL(int *addr)
+{
+ register int content = 1;
+ // xchgl content, *addr
+ // xchgl exchanges the values of its two operands, while
+ // locking the memory bus to exclude other operations.
+ asm volatile ("xchgl %0,%1" :
+ "=r" (content),
+ "=m" (*addr) :
+ "0" (content),
+ "m" (*addr));
+ return(content);
+}
+
+
+
the instruction "XCHG %eax, (content)" works as follows:
+
+
freeze other CPUs' memory activity
+
temp := content
+
content := %eax
+
%eax := temp
+
un-freeze other CPUs
+
+
+
steps 1 and 5 make XCHG special: it is "locked" special signal
+ lines on the inter-CPU bus, bus arbitration
+
+
This implementation doesn't scale to a large number of processors;
+ in a later lecture we will see how we could do better.
+
+
Lock granularity
+
+
Release/acquire is ideal for short atomic sections: increment a
+counter, search in i-node cache, allocate a free buffer.
+
+
What are spin locks not so great for? Long atomic sections may
+ waste waiters' CPU time and it is to sleep while holding locks. In
+ xv6 we try to avoid long atomic sections by carefully coding (can
+ you find an example?). xv6 doesn't release the processor when
+ holding a lock, but has an additional set of coordination primitives
+ (sleep and wakeup), which we will study later.
+
+
My list_lock protects all lists; inserts to different lists are
+ blocked. A lock per list would waste less time spinning so you might
+ want "fine-grained" locks, one for every object BUT acquire/release
+ are expensive (500 cycles on my 3 ghz machine) because they need to
+ talk off-chip.
+
+
Also, "correctness" is not that simple with fine-grained locks if
+ need to maintain global invariants; e.g., "every buffer must be on
+ exactly one of free list and device list". Per-list locks are
+ irrelevant for this invariant. So you might want "large-grained",
+ which reduces overhead but reduces concurrency.
+
+
This tension is hard to get right. One often starts out with
+ "large-grained locks" and measures the performance of the system on
+ some workloads. When more concurrency is desired (to get better
+ performance), an implementor may switch to a more fine-grained
+ scheme. Operating system designers fiddle with this all the time.
+
+
Recursive locks and modularity
+
+
When designing a system we desire clean abstractions and good
+ modularity. We like a caller not have to know about how a callee
+ implements a particul functions. Locks make achieving modularity
+ more complicated. For example, what to do when the caller holds a
+ lock, then calls a function, which also needs to the lock to perform
+ its job.
+
+
There are no transparent solutions that allow the caller and callee
+ to be unaware of which lokcs they use. One transparent, but
+ unsatisfactory option is recursive locks: If a callee asks for a
+ lock that its caller has, then we allow the callee to proceed.
+ Unfortunately, this solution is not ideal either.
+
+
Consider the following. If lock x protects the internals of some
+ struct foo, then if the caller acquires lock x, it know that the
+ internals of foo are in a sane state and it can fiddle with them.
+ And then the caller must restore them to a sane state before release
+ lock x, but until then anything goes.
+
+
This assumption doesn't hold with recursive locking. After
+ acquiring lock x, the acquirer knows that either it is the first to
+ get this lock, in which case the internals are in a sane state, or
+ maybe some caller holds the lock and has messed up the internals and
+ didn't realize when calling the callee that it was going to try to
+ look at them too. So the fact that a function acquired the lock x
+ doesn't guarantee anything at all. In short, locks protect against
+ callers and callees just as much as they protect against other
+ threads.
+
+
Since transparent solutions aren't ideal, it is better to consider
+ locks part of the function specification. The programmer must
+ arrange that a caller doesn't invoke another function while holding
+ a lock that the callee also needs.
+
+
Locking in xv6
+
+
xv6 runs on a multiprocessor and is programmed to allow multiple
+threads of computation to run concurrently. In xv6 an interrupt might
+run on one processor and a process in kernel mode may run on another
+processor, sharing a kernel data structure with the interrupt routing.
+xv6 uses locks, implemented using an atomic instruction, to coordinate
+concurrent activities.
+
+
Let's check out why xv6 needs locks by following what happens when
+we start a second processor:
+
+
1516: mp_init (called from main0)
+
1606: mp_startthem (called from main0)
+
1302: mpmain
+
2208: scheduler.
+ Now we have several processors invoking the scheduler
+ function. xv6 better ensure that multiple processors don't run the
+ same process! does it?
+ Yes, if multiple schedulers run concurrently, only one will
+ acquire proc_table_lock, and proceed looking for a runnable
+ process. if it finds a process, it will mark it running, longjmps to
+ it, and the process will release proc_table_lock. the next instance
+ of scheduler will skip this entry, because it is marked running, and
+ look for another runnable process.
+
+
+
Why hold proc_table_lock during a context switch? It protects
+p->state; the process has to hold some lock to avoid a race with
+wakeup() and yield(), as we will see in the next lectures.
+
+
Why not a lock per proc entry? It might be expensive in in whole
+table scans (in wait, wakeup, scheduler). proc_table_lock also
+protects some larger invariants, for example it might be hard to get
+proc_wait() right with just per entry locks. Right now the check to
+see if there are any exited children and the sleep are atomic -- but
+that would be hard with per entry locks. One could have both, but
+that would probably be neither clean nor fast.
+
+
Of course, there is only processor searching the proc table if
+acquire is implemented correctly. Let's check out acquire in
+spinlock.c:
+
+
1807: no recursive locks!
+
1811: why disable interrupts on the current processor? (if
+interrupt code itself tries to take a held lock, xv6 will deadlock;
+the panic will fire on 1808.)
+
+
can a process on a processor hold multiple locks?
+
+
1814: the (hopefully) atomic instruction.
+
+
see sheet 4, line 0468.
+
+
1819: make sure that stores issued on other processors before we
+got the lock are observed by this processor. these may be stores to
+the shared data structure that is protected by the lock.
+
+
+
+
+
Locking in JOS
+
+
JOS is meant to run on single-CPU machines, and the plan can be
+simple. The simple plan is disabling/enabling interrupts in the
+kernel (IF flags in the EFLAGS register). Thus, in the kernel,
+threads release the processors only when they want to and can ensure
+that they don't release the processor during a critical section.
+
+
In user mode, JOS runs with interrupts enabled, but Unix user
+applications don't share data structures. The data structures that
+must be protected, however, are the ones shared in the library
+operating system (e.g., pipes). In JOS we will use special-case
+solutions, as you will find out in lab 6. For example, to implement
+pipe we will assume there is one reader and one writer. The reader
+and writer never update each other's variables; they only read each
+other's variables. Carefully programming using this rule we can avoid
+races.
diff --git a/web/l-mkernel.html b/web/l-mkernel.html
new file mode 100644
index 0000000..2984796
--- /dev/null
+++ b/web/l-mkernel.html
@@ -0,0 +1,262 @@
+
Microkernel lecture
+
+
+
+
+
+
Microkernels
+
+
Required reading: Improving IPC by kernel design
+
+
Overview
+
+
This lecture looks at the microkernel organization. In a
+microkernel, services that a monolithic kernel implements in the
+kernel are running as user-level programs. For example, the file
+system, UNIX process management, pager, and network protocols each run
+in a separate user-level address space. The microkernel itself
+supports only the services that are necessary to allow system services
+to run well in user space; a typical microkernel has at least support
+for creating address spaces, threads, and inter process communication.
+
+
The potential advantages of a microkernel are simplicity of the
+kernel (small), isolation of operating system components (each runs in
+its own user-level address space), and flexibility (we can have a file
+server and a database server). One potential disadvantage is
+performance loss, because what in a monolithich kernel requires a
+single system call may require in a microkernel multiple system calls
+and context switches.
+
+
One way in how microkernels differ from each other is the exact
+kernel API they implement. For example, Mach (a system developed at
+CMU, which influenced a number of commercial operating systems) has
+the following system calls: processes (create, terminate, suspend,
+resume, priority, assign, info, threads), threads (fork, exit, join,
+detach, yield, self), ports and messages (a port is a unidirectionally
+communication channel with a message queue and supporting primitives
+to send, destroy, etc), and regions/memory objects (allocate,
+deallocate, map, copy, inherit, read, write).
+
+
Some microkernels are more "microkernel" than others. For example,
+some microkernels implement the pager in user space but the basic
+virtual memory abstractions in the kernel (e.g, Mach); others, are
+more extreme, and implement most of the virtual memory in user space
+(L4). Yet others are less extreme: many servers run in their own
+address space, but in kernel mode (Chorus).
+
+
All microkernels support multiple threads per address space. xv6
+and Unix until recently didn't; why? Because, in Unix system services
+are typically implemented in the kernel, and those are the primary
+programs that need multiple threads to handle events concurrently
+(waiting for disk and processing new I/O requests). In microkernels,
+these services are implemented in user-level address spaces and so
+they need a mechanism to deal with handling operations concurrently.
+(Of course, one can argue if fork efficient enough, there is no need
+to have threads.)
+
+
L3/L4
+
+
L3 is a predecessor to L4. L3 provides data persistence, DOS
+emulation, and ELAN runtime system. L4 is a reimplementation of L3,
+but without the data persistence. L4KA is a project at
+sourceforge.net, and you can download the code for the latest
+incarnation of L4 from there.
+
+
L4 is a "second-generation" microkernel, with 7 calls: IPC (of
+which there are several types), id_nearest (find a thread with an ID
+close the given ID), fpage_unmap (unmap pages, mapping is done as a
+side-effect of IPC), thread_switch (hand processor to specified
+thread), lthread_ex_regs (manipulate thread registers),
+thread_schedule (set scheduling policies), task_new (create a new
+address space with some default number of threads). These calls
+provide address spaces, tasks, threads, interprocess communication,
+and unique identifiers. An address space is a set of mappings.
+Multiple threads may share mappings, a thread may grants mappings to
+another thread (through IPC). Task is the set of threads sharing an
+address space.
+
+
A thread is the execution abstraction; it belongs to an address
+space, a UID, a register set, a page fault handler, and an exception
+handler. A UID of a thread is its task number plus the number of the
+thread within that task.
+
+
IPC passes data by value or by reference to another address space.
+It also provide for sequence coordination. It is used for
+communication between client and servers, to pass interrupts to a
+user-level exception handler, to pass page faults to an external
+pager. In L4, device drivers are implemented has a user-level
+processes with the device mapped into their address space.
+Linux runs as a user-level process.
+
+
L4 provides quite a scala of messages types: inline-by-value,
+strings, and virtual memory mappings. The send and receive descriptor
+specify how many, if any.
+
+
In addition, there is a system call for timeouts and controling
+thread scheduling.
+
+
L3/L4 paper discussion
+
+
+
+
This paper is about performance. What is a microsecond? Is 100
+usec bad? Is 5 usec so much better we care? How many instructions
+does 50-Mhz x86 execute in 100 usec? What can we compute with that
+number of instructions? How many disk operations in that time? How
+many interrupts can we take? (The livelock paper, which we cover in a
+few lectures, mentions 5,000 network pkts per second, and each packet
+generates two interrrupts.)
+
+
In performance calculations, what is the appropriate/better metric?
+Microseconds or cycles?
+
+
Goal: improve IPC performance by a factor 10 by careful kernel
+design that is fully aware of the hardware it is running on.
+Principle: performance rules! Optimize for the common case. Because
+in L3 interrupts are propagated to user-level using IPC, the system
+may have to be able to support many IPCs per second (as many as the
+device can generate interrupts).
+
+
IPC consists of transfering control and transfering data. The
+minimal cost for transfering control is 127 cycles, plus 45 cycles for
+TLB misses (see table 3). What are the x86 instructions to enter and
+leave the kernel? (int, iret) Why do they consume so much time?
+(Flush pipeline) Do modern processors perform these operations more
+efficient? Worse now. Faster processors optimized for straight-line
+code; Traps/Exceptions flush deeper pipeline, cache misses cost more
+cycles.
+
+
What are the 5 TLB misses: 1) B's thread control block; loading %cr3
+flushes TLB, so 2) kernel text causes miss; iret, accesses both 3) stack and
+4+5) user text - two pages B's user code looks at message
+
+
New system call: reply_and_receive. Effect: 2 system calls per
+RPC.
+
+
Complex messages: direct string, indirect strings, and memory
+objects.
+
+
Direct transfer by temporary mapping through a communication
+window. The communication window is mapped in B address space and in
+A's kernel address space; why is this better than just mapping a page
+shared between A and B's address space? 1) Multi-level security, it
+makes it hard to reason about information flow; 2) Receiver can't
+check message legality (might change after check); 3) When server has
+many clients, could run out of virtual address space Requires shared
+memory region to be established ahead of time; 4) Not application
+friendly, since data may already be at another address, i.e.
+applications would have to copy anyway--possibly more copies.
+
+
Why not use the following approach: map the region copy-on-write
+(or read-only) in A's address space after send and read-only in B's
+address space? Now B may have to copy data or cannot receive data in
+its final destination.
+
+
On the x86 implemented by coping B's PDE into A's address space.
+Why two PDEs? (Maximum message size is 4 Meg, so guaranteed to work
+if the message starts in the bottom for 4 Mbyte of an 8 Mbyte mapped
+region.) Why not just copy PTEs? Would be much more expensive
+
+
What does it mean for the TLB to be "window clean"? Why do we
+care? Means TLB contains no mappings within communication window. We
+care because mapping is cheap (copy PDE), but invalidation not; x86
+only lets you invalidate one page at a time, or whole TLB Does TLB
+invalidation of communication window turn out to be a problem? Not
+usually, because have to load %cr3 during IPC anyway
+
+
Thread control block registers, links to various double-linked
+ lists, pgdir, uid, etc.. Lower part of thread UID contains TCB
+ number. Can also dededuce TCB address from stack by taking SP AND
+ bitmask (the SP comes out of the TSS when just switching to kernel).
+
+
Kernel stack is on same page as tcb. why? 1) Minimizes TLB
+misses (since accessing kernel stack will bring in tcb); 2) Allows
+very efficient access to tcb -- just mask off lower 12 bits of %esp;
+3) With VM, can use lower 32-bits of thread id to indicate which tcb;
+using one page per tcb means no need to check if thread is swapped out
+(Can simply not map that tcb if shouldn't access it).
+
+
Invariant on queues: queues always hold in-memory TCBs.
+
+
Wakeup queue: set of 8 unordered wakeup lists (wakup time mod 8),
+and smart representation of time so that 32-bit integers can be used
+in the common case (base + offset in msec; bump base and recompute all
+offsets ~4 hours. maximum timeout is ~24 days, 2^31 msec).
+
+
What is the problem addressed by lazy scheduling?
+Conventional approach to scheduling:
+
+ A sends message to B:
+ Move A from ready queue to waiting queue
+ Move B from waiting queue to ready queue
+ This requires 58 cycles, including 4 TLB misses. What are TLB misses?
+ One each for head of ready and waiting queues
+ One each for previous queue element during the remove
+
+
Lazy scheduling:
+
+ Ready queue must contain all ready threads except current one
+ Might contain other threads that aren't actually ready, though
+ Each wakeup queue contains all threads waiting in that queue
+ Again, might contain other threads, too
+ Scheduler removes inappropriate queue entries when scanning
+ queue
+
+
+
Why does this help performance? Only three situations in which
+thread gives up CPU but stays ready: send syscall (as opposed to
+call), preemption, and hardware interrupts. So very often can IPC into
+thread while not putting it on ready list.
+
+
Direct process switch. This section just says you should use
+kernel threads instead of continuations.
+
+
Short messages via registers.
+
+
Avoiding unnecessary copies. Basically can send and receive
+ messages w. same vector. Makes forwarding efficient, which is
+ important for Clans/Chiefs model.
+
+
Segment register optimization. Loading segments registers is
+ slow, have to access GDT, etc. But common case is that users don't
+ change their segment registers. Observation: it is faster to check
+ that segment descriptor than load it. So just check that segment
+ registers are okay. Only need to load if user code changed them.
+
+
Registers for paramater passing where ever possible: systems calls
+and IPC.
+
+
Minimizing TLB misses. Try to cram as many things as possible onto
+same page: IPC kernel code, GDT, IDT, TSS, all on same page. Actually
+maybe can't fit whole tables but put the important parts of tables on
+the same page (maybe beginning of TSS, IDT, or GDT only?)
+
+
Coding tricks: short offsets, avoid jumps, avoid checks, pack
+ often-used data on same cache lines, lazily save/restore CPU state
+ like debug and FPU registers. Much of the kernel is written in
+ assembly!
+
+
What are the results? figure 7 and 8 look good.
+
+
Is fast IPC enough to get good overall system performance? This
+paper doesn't make a statement either way; we have to read their 1997
+paper to find find the answer to that question.
+
+
Is the principle of optimizing for performance right? In general,
+it is wrong to optimize for performance; other things matter more. Is
+IPC the one exception? Maybe, perhaps not. Was Liedtke fighting a
+losing battle against CPU makers? Should fast IPC time be a hardware,
+or just an OS issue?
+
+
Required reading: nami(), and all other file system code.
+
+
Overview
+
+
To help users to remember where they stored their data, most
+systems allow users to assign their own names to their data.
+Typically the data is organized in files and users assign names to
+files. To deal with many files, users can organize their files in
+directories, in a hierarchical manner. Each name is a pathname, with
+the components separated by "/".
+
+
To avoid that users have to type long abolute names (i.e., names
+starting with "/" in Unix), users can change their working directory
+and use relative names (i.e., naming that don't start with "/").
+
+
User file namespace operations include create, mkdir, mv, ln
+(link), unlink, and chdir. (How is "mv a b" implemented in xv6?
+Answer: "link a b"; "unlink a".) To be able to name the current
+directory and the parent directory every directory includes two
+entries "." and "..". Files and directories can reclaimed if users
+cannot name it anymore (i.e., after the last unlink).
+
+
Recall from last lecture, all directories entries contain a name,
+followed by an inode number. The inode number names an inode of the
+file system. How can we merge file systems from different disks into
+a single name space?
+
+
A user grafts new file systems on a name space using mount. Umount
+removes a file system from the name space. (In DOS, a file system is
+named by its device letter.) Mount takes the root inode of the
+to-be-mounted file system and grafts it on the inode of the name space
+entry where the file system is mounted (e.g., /mnt/disk1). The
+in-memory inode of /mnt/disk1 records the major and minor number of
+the file system mounted on it. When namei sees an inode on which a
+file system is mounted, it looks up the root inode of the mounted file
+system, and proceeds with that inode.
+
+
Mount is not a durable operation; it doesn't surive power failures.
+After a power failure, the system administrator must remount the file
+system (i.e., often in a startup script that is run from init).
+
+
Links are convenient, because with users can create synonyms for
+ file names. But, it creates the potential of introducing cycles in
+ the naning tree. For example, consider link("a/b/c", "a"). This
+ makes c a synonym for a. This cycle can complicate matters; for
+ example:
+
+
If a user subsequently calls unlink ("a"), then the user cannot
+ name the directory "b" and the link "c" anymore, but how can the
+ file system decide that?
+
+
+
This problem can be solved by detecting cycles. The second problem
+ can be solved by computing with files are reacheable from "/" and
+ reclaim all the ones that aren't reacheable. Unix takes a simpler
+ approach: avoid cycles by disallowing users to create links for
+ directories. If there are no cycles, then reference counts can be
+ used to see if a file is still referenced. In the inode maintain a
+ field for counting references (nlink in xv6's dinode). link
+ increases the reference count, and unlink decreases the count; if
+ the count reaches zero the inode and disk blocks can be reclaimed.
+
+
How to handle symbolic links across file systems (i.e., from one
+ mounted file system to another)? Since inodes are not unique across
+ file systems, we cannot create a link across file systems; the
+ directory entry only contains an inode number, not the inode number
+ and the name of the disk on which the inode is located. To handle
+ this case, Unix provides a second type of link, which are called
+ soft links.
+
+
Soft links are a special file type (e.g., T_SYMLINK). If namei
+ encounters a inode of type T_SYMLINK, it resolves the the name in
+ the symlink file to an inode, and continues from there. With
+ symlinks one can create cycles and they can point to non-existing
+ files.
+
+
The design of the name system can have security implications. For
+ example, if you tests if a name exists, and then use the name,
+ between testing and using it an adversary can have change the
+ binding from name to object. Such problems are called TOCTTOU.
+
+
An example of TOCTTOU is follows. Let's say root runs a script
+ every night to remove file in /tmp. This gets rid off the files
+ that editors might left behind, but we will never be used again. An
+ adversary can exploit this script as follows:
+
+Lstat checks whether /tmp/etc is not symbolic link, but by the time it
+runs unlink the attacker had time to creat a symbolic link in the
+place of /tmp/etc, with a password file of the adversary's choice.
+
+
This problem could have been avoided if every user or process group
+ had its own private /tmp, or if access to the shared one was
+ mediated.
+
+
V6 code examples
+
+
namei (sheet 46) is the core of the Unix naming system. namei can
+ be called in several ways: NAMEI_LOOKUP (resolve a name to an inode
+ and lock inode), NAMEI_CREATE (resolve a name, but lock parent
+ inode), and NAMEI_DELETE (resolve a name, lock parent inode, and
+ return offset in the directory). The reason is that namei is
+ complicated is that we want to atomically test if a name exist and
+ remove/create it, if it does; otherwise, two concurrent processes
+ could interfere with each other and directory could end up in an
+ inconsistent state.
+
+
Let's trace open("a", O_RDWR), focussing on namei:
+
+
5263: we will look at creating a file in a bit.
+
5277: call namei with NAMEI_LOOKUP
+
4629: if path name start with "/", lookup root inode (1).
+
4632: otherwise, use inode for current working directory.
+
4638: consume row of "/", for example in "/////a////b"
+
4641: if we are done with NAMEI_LOOKUP, return inode (e.g.,
+ namei("/")).
+
4652: if the inode we are searching for a name isn't of type
+ directory, give up.
+
4657-4661: determine length of the current component of the
+ pathname we are resolving.
+
4663-4681: scan the directory for the component.
+
4682-4696: the entry wasn't found. if we are the end of the
+ pathname and NAMEI_CREATE is set, lock parent directory and return a
+ pointer to the start of the component. In all other case, unlock
+ inode of directory, and return 0.
+
4701: if NAMEI_DELETE is set, return locked parent inode and the
+ offset of the to-be-deleted component in the directory.
+
4707: lookup inode of the component, and go to the top of the loop.
+
+
+
Now let's look at creating a file in a directory:
+
+
5264: if the last component doesn't exist, but first part of the
+ pathname resolved to a directory, then dp will be 0, last will point
+ to the beginning of the last component, and ip will be the locked
+ parent directory.
+
5266: create an entry for last in the directory.
+
4772: mknod1 allocates a new named inode and adds it to an
+ existing directory.
+
4776: ialloc. skan inode block, find unused entry, and write
+ it. (if lucky 1 read and 1 write.)
+
4784: fill out the inode entry, and write it. (another write)
+
4786: write the entry into the directory (if lucky, 1 write)
+
+
+
+Why must the parent directory be locked? If two processes try to
+create the same name in the same directory, only one should succeed
+and the other one, should receive an error (file exist).
+
+
Link, unlink, chdir, mount, umount could have taken file
+descriptors instead of their path argument. In fact, this would get
+rid of some possible race conditions (some of which have security
+implications, TOCTTOU). However, this would require that the current
+working directory be remembered by the process, and UNIX didn't have
+good ways of maintaining static state shared among all processes
+belonging to a given user. The easiest way is to create shared state
+is to place it in the kernel.
+
+
We have one piece of code in xv6 that we haven't studied: exec.
+ With all the ground work we have done this code can be easily
+ understood (see sheet 54).
+
+
diff --git a/web/l-okws.txt b/web/l-okws.txt
new file mode 100644
index 0000000..fa940d0
--- /dev/null
+++ b/web/l-okws.txt
@@ -0,0 +1,249 @@
+
+Security
+-------------------
+I. 2 Intro Examples
+II. Security Overview
+III. Server Security: Offense + Defense
+IV. Unix Security + POLP
+V. Example: OKWS
+VI. How to Build a Website
+
+I. Intro Examples
+--------------------
+1. Apache + OpenSSL 0.9.6a (CAN 2002-0656)
+ - SSL = More security!
+
+ unsigned int j;
+ p=(unsigned char *)s->init_buf->data;
+ j= *(p++);
+ s->session->session_id_length=j;
+ memcpy(s->session->session_id,p,j);
+
+ - the result: an Apache worm
+
+2. SparkNotes.com 2000:
+ - New profile feature that displays "public" information about users
+ but bug that made e-mail addresses "public" by default.
+ - New program for getting that data:
+
+ http://www.sparknotes.com/getprofile.cgi?id=1343
+
+II. Security Overview
+----------------------
+
+What Is Security?
+ - Protecting your system from attack.
+
+ What's an attack?
+ - Stealing data
+ - Corrupting data
+ - Controlling resources
+ - DOS
+
+ Why attack?
+ - Money
+ - Blackmail / extortion
+ - Vendetta
+ - intellectual curiosity
+ - fame
+
+Security is a Big topic
+
+ - Server security -- today's focus. There's some machine sitting on the
+ Internet somewhere, with a certain interface exposed, and attackers
+ want to circumvent it.
+ - Why should you trust your software?
+
+ - Client security
+ - Clients are usually servers, so they have many of the same issues.
+ - Slight simplification: people across the network cannot typically
+ initiate connections.
+ - Has a "fallible operator":
+ - Spyware
+ - Drive-by-Downloads
+
+ - Client security turns out to be much harder -- GUI considerations,
+ look inside the browser and the applications.
+ - Systems community can more easily handle server security.
+ - We think mainly of servers.
+
+III. Server Security: Offense and Defense
+-----------------------------------------
+ - Show picture of a Web site.
+
+ Attacks | Defense
+----------------------------------------------------------------------------
+ 1. Break into DB from net | 1. FW it off
+ 2. Break into WS on telnet | 2. FW it off
+ 3. Buffer overrun in Apache | 3. Patch apache / use better lang?
+ 4. Buffer overrun in our code | 4. Use better lang / isolate it
+ 5. SQL injection | 5. Better escaping / don't interpret code.
+ 6. Data scraping. | 6. Use a sparse UID space.
+ 7. PW sniffing | 7. ???
+ 8. Fetch /etc/passwd and crack | 8. Don't expose /etc/passwd
+ PW |
+ 9. Root escalation from apache | 9. No setuid programs available to Apache
+10. XSS |10. Filter JS and input HTML code.
+11. Keystroke recorded on sys- |11. Client security
+ admin's desktop (planetlab) |
+12. DDOS |12. ???
+
+Summary:
+ - That we want private data to be available to right people makes
+ this problem hard in the first place. Internet servers are there
+ for a reason.
+ - Security != "just encrypt your data;" this in fact can sometimes
+ make the problem worse.
+ - Best to prevent break-ins from happening in the first place.
+ - If they do happen, want to limit their damage (POLP).
+ - Security policies are difficult to express / package up neatly.
+
+IV. Design According to POLP (in Unix)
+---------------------------------------
+ - Assume any piece of a system can be compromised, by either bad
+ programming or malicious attack.
+ - Try to limit the damage done by such a compromise (along the lines
+ of the 4 attack goals).
+
+
+
+What's the goal on Unix?
+ - Keep processes from communicating that don't have to:
+ - limit FS, IPC, signals, ptrace
+ - Strip away unneeded privilege
+ - with respect to network, FS.
+ - Strip away FS access.
+
+How on Unix?
+ - setuid/setgid
+ - system call interposition
+ - chroot (away from setuid executables, /etc/passwd, /etc/ssh/..)
+
+
+
+How do you write chroot'ed programs?
+ - What about shared libraries?
+ - /etc/resolv.conf?
+ - Can chroot'ed programs access the FS at all? What if they need
+ to write to the FS or read from the FS?
+ - Fd's are *capabilities*; can pass them to chroot'ed services,
+ thereby opening new files on its behalf.
+ - Unforgeable - can only get them from the kernel via open/socket, etc.
+
+Unix Shortcomings (round 1)
+ - It's bad to run as root!
+ - Yet, need root for:
+ - chroot
+ - setuid/setgid to a lower-privileged user
+ - create a new user ID
+ - Still no guarantee that we've cut off all channels
+ - 200 syscalls!
+ - Default is to give most/all privileges.
+ - Can "break out" of chroot jails?
+ - Can still exploit race conditions in the kernel to escalate privileges.
+
+Sidebar
+ - setuid / setuid misunderstanding
+ - root / root misunderstanding
+ - effective vs. real vs. saved set-user-ID
+
+V. OKWS
+-------
+- Taking these principles as far as possible.
+- C.f. Figure 1 From the paper..
+- Discussion of which privileges are in which processes
+
+
+
+- Technical details: how to launch a new service
+- Within the launcher (running as root):
+
+
+
+ // receive FDs from logger, pubd, demux
+ fork ();
+ chroot ("/var/okws/run");
+ chdir ("/coredumps/51001");
+ setgid (51001);
+ setuid (51001);
+ exec ("login", fds ... );
+
+- Note no chroot -- why not?
+- Once launched, how does a service get new connections?
+- Note the goal - minimum tampering with each other in the
+ case of a compromise.
+
+Shortcoming of Unix (2)
+- A lot of plumbing involved with this system. FDs flying everywhere.
+- Isolation still not fine enough. If a service gets taken over,
+ can compromise all users of that service.
+
+VI. Reflections on Building Websites
+---------------------------------
+- OKWS interesting "experiment"
+- Need for speed; also, good gzip support.
+- If you need compiled code, it's a good way to go.
+- RPC-like system a must for backend communication
+- Connection-pooling for free
+
+Biggest difficulties:
+- Finding good C++ programmers.
+- Compile times.
+- The DB is still always the problem.
+
+Hard to Find good Alternatives
+- Python / Perl - you might spend a lot of time writing C code /
+ integrating with lower level languages.
+- Have to worry about DB pooling.
+- Java -- must viable, and is getting better. Scary you can't peer
+ inside.
+- .Net / C#-based system might be the way to go.
+
+
+=======================================================================
+
+Extra Material:
+
+Capabilities (From the Eros Paper in SOSP 1999)
+
+ - "Unforgeable pair made up of an object ID and a set of authorized
+ operations (an interface) on that object."
+ - c.f. Dennis and van Horn. "Programming semantics for multiprogrammed
+ computations," Communications of the ACM 9(3):143-154, Mar 1966.
+ - Thus:
+