518 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			518 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <title>L4</title>
 | |
| <html>
 | |
| <head>
 | |
| </head>
 | |
| <body>
 | |
| 
 | |
| <h1>Address translation and sharing using segments</h1>
 | |
| 
 | |
| <p>This lecture is about virtual memory, focusing on address
 | |
| spaces. It is the first lecture out of series of lectures that uses
 | |
| xv6 as a case study.
 | |
| 
 | |
| <h2>Address spaces</h2>
 | |
| 
 | |
| <ul>
 | |
| 
 | |
| <li>OS: kernel program and user-level programs. For fault isolation
 | |
| each program runs in a separate address space. The kernel address
 | |
| spaces is like user address spaces, expect it runs in kernel mode.
 | |
| The program in kernel mode can execute priviledge instructions (e.g.,
 | |
| writing the kernel's code segment registers).
 | |
| 
 | |
| <li>One job of kernel is to manage address spaces (creating, growing,
 | |
| deleting, and switching between them)
 | |
| 
 | |
| <ul>
 | |
| 
 | |
| <li>Each address space (including kernel) consists of the binary
 | |
|     representation for the text of the program, the data part
 | |
|     part of the program, and the stack area.
 | |
| 
 | |
| <li>The kernel address space runs the kernel program. In a monolithic
 | |
|     organization the kernel manages all hardware and provides an API
 | |
|     to user programs.
 | |
| 
 | |
| <li>Each user address space contains a program.  A user progam may ask
 | |
|   to shrink or grow its address space.
 | |
| 
 | |
| </ul>
 | |
| 
 | |
| <li>The main operations:
 | |
| <ul>
 | |
| <li>Creation.  Allocate physical memory to storage program. Load
 | |
| program into physical memory. Fill address spaces with references to
 | |
| physical memory. 
 | |
| <li>Growing. Allocate physical memory and add it to address space.
 | |
| <li>Shrinking. Free some of the memory in an address space.
 | |
| <li>Deletion. Free all memory in an address space.
 | |
| <li>Switching. Switch the processor to use another address space.
 | |
| <li>Sharing.  Share a part of an address space with another program.
 | |
| </ul>
 | |
| </ul>
 | |
| 
 | |
| <p>Two main approaches to implementing address spaces: using segments
 | |
|   and using page tables. Often when one uses segments, one also uses
 | |
|   page tables.  But not the other way around; i.e., paging without
 | |
|   segmentation is common.
 | |
| 
 | |
| <h2>Example support for address spaces: x86</h2>
 | |
| 
 | |
| <p>For an operating system to provide address spaces and address
 | |
| translation typically requires support from hardware.  The translation
 | |
| and checking of permissions typically must happen on each address used
 | |
| by a program, and it would be too slow to check that in software (if
 | |
| even possible).  The division of labor is operating system manages
 | |
| address spaces, and hardware translates addresses and checks
 | |
| permissions.
 | |
| 
 | |
| <p>PC block diagram without virtual memory support:
 | |
| <ul>
 | |
| <li>physical address
 | |
| <li>base, IO hole, extended memory
 | |
| <li>Physical address == what is on CPU's address pins
 | |
| </ul>
 | |
| 
 | |
| <p>The x86 starts out in real mode and translation is as follows:
 | |
| 	<ul>
 | |
| 	<li>segment*16+offset ==> physical address
 | |
|         <li>no protection: program can load anything into seg reg
 | |
| 	</ul>
 | |
| 
 | |
| <p>The operating system can switch the x86 to protected mode, which
 | |
| allows the operating system to create address spaces. Translation in
 | |
| protected mode is as follows:
 | |
| 	<ul>
 | |
| 	<li>selector:offset (logical addr) <br>
 | |
| 	     ==SEGMENTATION==> 
 | |
| 	<li>linear address <br>
 | |
| 	     ==PAGING ==>
 | |
| 	<li>physical address
 | |
| 	</ul>
 | |
| 
 | |
| <p>Next lecture covers paging; now we focus on segmentation. 
 | |
| 
 | |
| <p>Protected-mode segmentation works as follows:
 | |
| <ul>
 | |
| <li>protected-mode segments add 32-bit addresses and protection
 | |
| <ul>
 | |
| <li>wait: what's the point? the point of segments in real mode was
 | |
|   bigger addresses, but 32-bit mode fixes that!
 | |
| </ul>
 | |
| <li>segment register holds segment selector
 | |
| <li>selector indexes into global descriptor table (GDT)
 | |
| <li>segment descriptor holds 32-bit base, limit, type, protection
 | |
| <li>la = va + base ; assert(va < limit);
 | |
| <li>seg register usually implicit in instruction
 | |
| 	<ul>
 | |
| 	<li>DS:REG
 | |
| 		<ul>
 | |
| 		<li><tt>movl $0x1, _flag</tt>
 | |
| 		</ul>
 | |
| 	<li>SS:ESP, SS:EBP
 | |
| 		<ul>
 | |
| 		<li><tt>pushl %ecx, pushl $_i</tt>
 | |
| 		<li><tt>popl %ecx</tt>
 | |
| 		<li><tt>movl 4(%ebp),%eax</tt>
 | |
| 		</ul>
 | |
| 	<li>CS:EIP
 | |
| 		<ul>
 | |
| 		<li>instruction fetch
 | |
| 		</ul>
 | |
| 	<li>String instructions: read from DS:ESI, write to ES:EDI
 | |
| 		<ul>
 | |
| 		<li><tt>rep movsb</tt>
 | |
| 		</ul>
 | |
| 	<li>Exception: far addresses
 | |
| 		<ul>
 | |
| 		<li><tt>ljmp $selector, $offset</tt>
 | |
| 		</ul>
 | |
| 	</ul>
 | |
| <li>LGDT instruction loads CPU's GDT register
 | |
| <li>you turn on protected mode by setting PE bit in CR0 register
 | |
| <li>what happens with the next instruction? CS now has different
 | |
|   meaning...
 | |
| 
 | |
| <li>How to transfer from segment to another, perhaps with different
 | |
| priveleges.
 | |
| <ul>
 | |
| <li>Current privilege level (CPL) is in the low 2 bits of CS
 | |
| <li>CPL=0 is privileged O/S, CPL=3 is user
 | |
| <li>Within in the same privelege level: ljmp.
 | |
| <li>Transfer to a segment with more privilege: call gates.
 | |
| <ul>
 | |
| <li>a way for app to jump into a segment and acquire privs
 | |
| <li>CPL must be <= descriptor's DPL in order to read or write segment
 | |
| <li>call gates can change privelege <b>and</b> switch CS and SS
 | |
|   segment
 | |
| <li>call gates are implemented using a special type segment descriptor
 | |
|   in the GDT.
 | |
| <li>interrupts are conceptually the same as call gates, but their
 | |
|   descriptor is stored in the IDT.  We will use interrupts to transfer
 | |
|   control between user and kernel mode, both in JOS and xv6.  We will
 | |
|   return to this in the lecture about interrupts and exceptions.
 | |
| </ul>
 | |
| </ul>
 | |
| 
 | |
| <li>What about protection?
 | |
| <ul>
 | |
|   <li>can o/s limit what memory an application can read or write?
 | |
|   <li>app can load any selector into a seg reg...
 | |
|   <li>but can only mention indices into GDT
 | |
|   <li>app can't change GDT register (requires privilege)
 | |
|   <li>why can't app write the descriptors in the GDT?
 | |
|   <li>what about system calls? how to they transfer to kernel?
 | |
|   <li>app cannot <b>just</b> lower the CPL
 | |
| </ul>
 | |
| </ul>
 | |
| 
 | |
| <h2>Case study (xv6)</h2>
 | |
| 
 | |
| <p>xv6 is a reimplementation of <a href="../v6.html">Unix 6th edition</a>.
 | |
| <ul>
 | |
| <li>v6 is a version of the orginal Unix operating system for <a href="http://www.pdp11.org/">DEC PDP11</a>
 | |
| <ul>
 | |
|        <li>PDP-11 (1972): 
 | |
| 	 <li>16-bit processor, 18-bit physical (40)
 | |
| 	 <li>UNIBUS
 | |
| 	 <li>memory-mapped I/O
 | |
| 	 <li>performance: less than 1MIPS
 | |
| 	 <li>register-to-register transfer: 0.9 usec
 | |
| 	 <li>56k-228k (40)
 | |
| 	 <li>no paging, but some segmentation support
 | |
| 	 <li>interrupts, traps
 | |
| 	 <li>about $10K
 | |
|          <li>rk disk with 2MByte of storage
 | |
| 	 <li>with cabinet 11/40 is 400lbs
 | |
| </ul>
 | |
|        <li>Unix v6
 | |
| <ul>
 | |
|          <li><a href="../reference.html">Unix papers</a>.
 | |
|          <li>1976; first widely available Unix outside Bell labs
 | |
|          <li>Thompson and Ritchie
 | |
|          <li>Influenced by Multics but simpler.
 | |
| 	 <li>complete (used for real work)
 | |
| 	 <li>Multi-user, time-sharing
 | |
| 	 <li>small (43 system calls)
 | |
| 	 <li>modular (composition through pipes; one had to split programs!!)
 | |
| 	 <li>compactly written (2 programmers, 9,000 lines of code)
 | |
| 	 <li>advanced UI (shell)
 | |
| 	 <li>introduced C (derived from B)
 | |
| 	 <li>distributed with source
 | |
| 	 <li>V7 was sold by Microsoft for a couple years under the name Xenix
 | |
| </ul>
 | |
|        <li>Lion's commentary
 | |
| <ul>
 | |
|          <li>surpressed because of copyright issue
 | |
| 	 <li>resurfaced in 1996
 | |
| </ul>
 | |
| 
 | |
| <li>xv6 written for 6.828:
 | |
| <ul>
 | |
|          <li>v6 reimplementation for x86
 | |
| 	 <li>does't include all features of v6 (e.g., xv6 has 20 of 43
 | |
| 	 system calls).
 | |
| 	 <li>runs on symmetric multiprocessing PCs (SMPs).
 | |
| </ul>
 | |
| </ul>
 | |
| 
 | |
| <p>Newer Unixs have inherited many of the conceptual ideas even though
 | |
| they added paging, networking, graphics, improve performance, etc.
 | |
| 
 | |
| <p>You will need to read most of the source code multiple times. Your
 | |
| goal is to explain every line to yourself.
 | |
| 
 | |
| <h3>Overview of address spaces in xv6</h3>
 | |
| 
 | |
| <p>In today's lecture we see how xv6 creates the kernel address
 | |
|  spaces, first user address spaces, and switches to it. To understand
 | |
|  how this happens, we need to understand in detail the state on the
 | |
|  stack too---this may be surprising, but a thread of control and
 | |
|  address space are tightly bundled in xv6, in a concept
 | |
|  called <i>process</i>.  The kernel address space is the only address
 | |
|  space with multiple threads of control. We will study context
 | |
|  switching and process management in detail next weeks; creation of
 | |
|  the first user process (init) will get you a first flavor.
 | |
| 
 | |
| <p>xv6 uses only the segmentation hardware on xv6, but in a limited
 | |
|   way. (In JOS you will use page-table hardware too, which we cover in
 | |
|   next lecture.)  The adddress space layouts are as follows:
 | |
| <ul>
 | |
| <li>In kernel address space is set up as follows:
 | |
|   <pre>
 | |
|   the code segment runs from 0 to 2^32 and is mapped X and R
 | |
|   the data segment runs from 0 to 2^32 but is mapped W (read and write).
 | |
|   </pre>
 | |
| <li>For each process, the layout is as follows: 
 | |
| <pre>
 | |
|   text
 | |
|   original data and bss
 | |
|   fixed-size stack
 | |
|   expandable heap
 | |
| </pre>
 | |
| The text of a process is stored in its own segment and the rest in a
 | |
| data segment.  
 | |
| </ul>
 | |
| 
 | |
| <p>xv6 makes minimal use of the segmentation hardware available on the
 | |
| x86. What other plans could you envision?
 | |
| 
 | |
| <p>In xv6, each each program has a user and a kernel stack; when the
 | |
| user program switches to the kernel, it switches to its kernel stack.
 | |
| Its kernel stack is stored in process's proc structure. (This is
 | |
| arranged through the descriptors in the IDT, which is covered later.)
 | |
| 
 | |
| <p>xv6 assumes that there is a lot of physical memory. It assumes that
 | |
|   segments can be stored contiguously in physical memory and has
 | |
|   therefore no need for page tables.
 | |
| 
 | |
| <h3>xv6 kernel address space</h3>
 | |
| 
 | |
| <p>Let's see how xv6 creates the kernel address space by tracing xv6
 | |
|   from when it boots, focussing on address space management:
 | |
| <ul>
 | |
| <li>Where does xv6 start after the PC is power on: start (which is
 | |
|   loaded at physical address 0x7c00; see lab 1).
 | |
| <li>1025-1033: are we in real mode?
 | |
| <ul>
 | |
| <li>how big are logical addresses?
 | |
| <li>how big are physical addresses?
 | |
| <li>how are addresses physical calculated?
 | |
| <li>what segment is being used in subsequent code?
 | |
| <li>what values are in that segment?
 | |
| </ul>
 | |
| <li>1068: what values are loaded in the GDT?
 | |
| <ul>
 | |
| <li>1097: gdtr points to gdt
 | |
| <li>1094: entry 0 unused
 | |
| <li>1095: entry 1 (X + R, base = 0, limit = 0xffffffff, DPL = 0)
 | |
| <li>1096: entry 2 (W, base = 0, limit = 0xffffffff, DPL = 0)
 | |
| <li>are we using segments in a sophisticated way? (i.e., controled sharing)
 | |
| <li>are P and S set?
 | |
| <li>are addresses translated as in protected mode when lgdt completes?
 | |
| </ul>
 | |
| <li>1071: no, and not even here.
 | |
| <li>1075: far jump, load 8 in CS. from now on we use segment-based translation.
 | |
| <li>1081-1086: set up other segment registers
 | |
| <li>1087: where is the stack which is used for procedure calls?
 | |
| <li>1087: cmain in the bootloader (see lab 1), which calls main0
 | |
| <li>1222: main0.  
 | |
| <ul>
 | |
| <li>job of main0 is to set everthing up so that all xv6 convtions works
 | |
| <li>where is the stack?  (sp = 0x7bec)
 | |
| <li>what is on it?
 | |
| <pre>
 | |
|    00007bec [00007bec]  7cda  // return address in cmain
 | |
|    00007bf0 [00007bf0]  0080  // callee-saved ebx
 | |
|    00007bf4 [00007bf4]  7369  // callee-saved esi
 | |
|    00007bf8 [00007bf8]  0000  // callee-saved ebp
 | |
|    00007bfc [00007bfc]  7c49  // return address for cmain: spin
 | |
|    00007c00 [00007c00]  c031fcfa  // the instructions from 7c00 (start)
 | |
| </pre>
 | |
| </ul>
 | |
| <li>1239-1240: switch to cpu stack (important for scheduler)
 | |
| <ul>
 | |
| <li>why -32?
 | |
| <li>what values are in ebp and esp?
 | |
| <pre>
 | |
| esp: 0x108d30   1084720
 | |
| ebp: 0x108d5c   1084764
 | |
| </pre>
 | |
| <li>what is on the stack?
 | |
| <pre>
 | |
|    00108d30 [00108d30]  0000
 | |
|    00108d34 [00108d34]  0000
 | |
|    00108d38 [00108d38]  0000
 | |
|    00108d3c [00108d3c]  0000
 | |
|    00108d40 [00108d40]  0000
 | |
|    00108d44 [00108d44]  0000
 | |
|    00108d48 [00108d48]  0000
 | |
|    00108d4c [00108d4c]  0000
 | |
|    00108d50 [00108d50]  0000
 | |
|    00108d54 [00108d54]  0000
 | |
|    00108d58 [00108d58]  0000
 | |
|    00108d5c [00108d5c]  0000
 | |
|    00108d60 [00108d60]  0001
 | |
|    00108d64 [00108d64]  0001
 | |
|    00108d68 [00108d68]  0000
 | |
|    00108d6c [00108d6c]  0000
 | |
| </pre>
 | |
| 
 | |
| <li>what is 1 in 0x108d60?  is it on the stack?
 | |
| 
 | |
| </ul>
 | |
| 
 | |
| <li>1242: is it save to reference bcpu?  where is it allocated?
 | |
| 
 | |
| <li>1260-1270: set up proc[0]
 | |
| 
 | |
| <ul>
 | |
| <li>each process has its own stack (see struct proc).
 | |
| 
 | |
| <li>where is its stack?  (see the section below on physical memory
 | |
|   management below).
 | |
| 
 | |
| <li>what is the jmpbuf?  (will discuss in detail later)
 | |
| 
 | |
| <li>1267: why -4?
 | |
| 
 | |
| </ul>
 | |
| 
 | |
| <li>1270: necessar to be able to take interrupts (will discuss in
 | |
|   detail later)
 | |
| 
 | |
| <li>1292: what process do you think scheduler() will run?  we will
 | |
|   study later how that happens, but let's assume it runs process0 on
 | |
|   process0's stack.
 | |
| </ul>
 | |
| 
 | |
| <h3>xv6 user address spaces</h3>
 | |
| 
 | |
| <ul>
 | |
| <li>1327: process0  
 | |
| <ul>
 | |
| <li>process 0 sets up everything to make process conventions work out
 | |
| 
 | |
| <li>which stack is process0 running?  see 1260.
 | |
| 
 | |
| <li>1334: is the convention to release the proc_table_lock after being
 | |
|   scheduled? (we will discuss locks later; assume there are no other
 | |
|   processors for now.)
 | |
| 
 | |
| <li>1336: cwd is current working directory.
 | |
| 
 | |
| <li>1348: first step in initializing a template tram frame: set
 | |
|   everything to zero. we are setting up process 0 as if it just
 | |
|   entered the kernel from user space and wants to go back to user
 | |
|   space.  (see x86.h to see what field have the value 0.)
 | |
| 
 | |
| <li>1349: why "|3"?  instead of 0?
 | |
| 
 | |
| <li>1351: why set interrupt flag in template trapframe?
 | |
| 
 | |
| <li>1352: where will the user stack be in proc[0]'s address space?
 | |
| 
 | |
| <li>1353: makes a copy of proc0.  fork() calls copyproc() to implement
 | |
|   forking a process.  This statement in essense is calling fork inside
 | |
|   proc0, making a proc[1] a duplicate of proc[0].  proc[0], however,
 | |
|   has not much in its address space of one page (see 1341).
 | |
| <ul>
 | |
| <li>2221: grab a lock on the proc table so that we are the only one
 | |
|   updating it.
 | |
| <li>2116: allocate next pid.
 | |
| <li>2228: we got our entry; release the  lock. from now we are only
 | |
|   modifying our entry.
 | |
| <li>2120-2127: copy proc[0]'s memory.  proc[1]'s memory will be identical
 | |
|   to proc[0]'s.
 | |
| <li>2130-2136: allocate a kernel stack. this stack is different from
 | |
|   the stack that proc[1] uses when running in user mode.
 | |
| <li>2139-2140: copy the template trapframe that xv6 had set up in
 | |
|   proc[0].
 | |
| <li>2147: where will proc[1] start running when the scheduler selects
 | |
|   it?
 | |
| <li>2151-2155: Unix semantics: child inherits open file descriptors
 | |
|   from parent.
 | |
| <li>2158: same for cwd.
 | |
| </ul>
 | |
| 
 | |
| <li>1356: load a program in proc[1]'s address space.  the program
 | |
|   loaded is the binary version of init.c (sheet 16).
 | |
| 
 | |
| <li>1374: where will proc[1] start?
 | |
| 
 | |
| <li>1377-1388: copy the binary into proc[1]'s address space.  (you
 | |
|   will learn about the ELF format in the labs.)  
 | |
| <ul>
 | |
| <li>can the binary for init be any size for proc[1] to work correctly?
 | |
| 
 | |
| <li>what is the layout of proc[1]'s address space? is it consistent
 | |
|   with the layout described on line 1950-1954?
 | |
| 
 | |
| </ul>
 | |
| 
 | |
| <li>1357: make proc[1] runnable so that the scheduler will select it
 | |
|   to run.  everything is set up now for proc[1] to run, "return" to
 | |
|   user space, and execute init.
 | |
| 
 | |
| <li>1359: proc[0] gives up the processor, which calls sleep, which
 | |
|   calls sched, which setjmps back to scheduler. let's peak a bit in
 | |
|   scheduler to see what happens next.  (we will return to the
 | |
|   scheduler in more detail later.)
 | |
| </ul>
 | |
| <li>2219: this test will fail for proc[1]
 | |
| <li>2226: setupsegs(p) sets up the segments for proc[1].  this call is
 | |
|   more interesting than the previous, so let's see what happens:
 | |
| <ul>
 | |
| <li>2032-37: this is for traps and interrupts, which we will cover later.
 | |
| <li>2039-49: set up new gdt.
 | |
| <li>2040: why 0x100000 + 64*1024?
 | |
| <li>2045: why 3?  why is base p->mem? is p->mem physical or logical?
 | |
| <li>2045-2046: how much the program for proc[1] be compiled if proc[1]
 | |
|   will run successfully in user space?
 | |
| <li>2052: we are still running in the kernel, but we are loading gdt.
 | |
|   is this ok?
 | |
| <li>why have so few user-level segments?  why not separate out code,
 | |
|   data, stack, bss, etc.?
 | |
| </ul>
 | |
| <li>2227: record that proc[1] is running on the cpu
 | |
| <li>2228: record it is running instead of just runnable
 | |
| <li>2229: setjmp to fork_ret.
 | |
| <li>2282: which stack is proc[1] running on?
 | |
| <li>2284: when scheduled, first release the proc_table_lock.
 | |
| <li>2287: back into assembly.
 | |
| <li>2782: where is the stack pointer pointing to?
 | |
| <pre>
 | |
|    0020dfbc [0020dfbc]  0000
 | |
|    0020dfc0 [0020dfc0]  0000
 | |
|    0020dfc4 [0020dfc4]  0000
 | |
|    0020dfc8 [0020dfc8]  0000
 | |
|    0020dfcc [0020dfcc]  0000
 | |
|    0020dfd0 [0020dfd0]  0000
 | |
|    0020dfd4 [0020dfd4]  0000
 | |
|    0020dfd8 [0020dfd8]  0000
 | |
|    0020dfdc [0020dfdc]  0023
 | |
|    0020dfe0 [0020dfe0]  0023
 | |
|    0020dfe4 [0020dfe4]  0000
 | |
|    0020dfe8 [0020dfe8]  0000
 | |
|    0020dfec [0020dfec]  0000
 | |
|    0020dff0 [0020dff0]  001b
 | |
|    0020dff4 [0020dff4]  0200
 | |
|    0020dff8 [0020dff8]  1000
 | |
| </pre>
 | |
| <li>2783: why jmp instead of call?
 | |
| <li>what will iret put in eip?
 | |
| <li>what is 0x1b?  what will iret put in cs?
 | |
| <li>after iret, what will the processor being executing?
 | |
| </ul>
 | |
| 
 | |
| <h3>Managing physical memory</h3>
 | |
| 
 | |
| <p>To create an address space we must allocate physical memory, which
 | |
|   will be freed when an address space is deleted (e.g., when a user
 | |
|   program terminates).  xv6 implements a first-fit memory allocater
 | |
|   (see kalloc.c).  
 | |
| 
 | |
| <p>It maintains a list of ranges of free memory.  The allocator finds
 | |
|   the first range that is larger than the amount of requested memory.
 | |
|   It splits that range in two: one range of the size requested and one
 | |
|   of the remainder.  It returns the first range.  When memory is
 | |
|   freed, kfree will merge ranges that are adjacent in memory.
 | |
| 
 | |
| <p>Under what scenarios is a first-fit memory allocator undesirable?
 | |
| 
 | |
| <h3>Growing an address space</h3>
 | |
| 
 | |
| <p>How can a user process grow its address space?  growproc.
 | |
| <ul>
 | |
| <li>2064: allocate a new segment of old size plus n
 | |
| <li>2067: copy the old segment into the new (ouch!)
 | |
| <li>2068: and zero the rest.
 | |
| <li>2071: free the old physical memory
 | |
| </ul>
 | |
| <p>We could do a lot better if segments didn't have to contiguous in
 | |
|   physical memory. How could we arrange that? Using page tables, which
 | |
|   is our next topic.  This is one place where page tables would be
 | |
|   useful, but there are others too (e.g., in fork).
 | |
| </body>
 | |
| 
 | |
| 
 | 
