210 lines
		
	
	
	
		
			8.1 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			210 lines
		
	
	
	
		
			8.1 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<title>Lecture 5/title>
 | 
						|
<html>
 | 
						|
<head>
 | 
						|
</head>
 | 
						|
<body>
 | 
						|
 | 
						|
<h2>Address translation and sharing using page tables</h2>
 | 
						|
 | 
						|
<p> Reading: <a href="../readings/i386/toc.htm">80386</a> chapters 5 and 6<br>
 | 
						|
 | 
						|
<p> Handout: <b> x86 address translation diagram</b> - 
 | 
						|
<a href="x86_translation.ps">PS</a> -
 | 
						|
<a href="x86_translation.eps">EPS</a> -
 | 
						|
<a href="x86_translation.fig">xfig</a>
 | 
						|
<br>
 | 
						|
 | 
						|
<p>Why do we care about x86 address translation?
 | 
						|
<ul>
 | 
						|
<li>It can simplify s/w structure by placing data at fixed known addresses.
 | 
						|
<li>It can implement tricks like demand paging and copy-on-write.
 | 
						|
<li>It can isolate programs to contain bugs.
 | 
						|
<li>It can isolate programs to increase security.
 | 
						|
<li>JOS uses paging a lot, and segments more than you might think.
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>Why aren't protected-mode segments enough?
 | 
						|
<ul>
 | 
						|
<li>Why did the 386 add translation using page tables as well?
 | 
						|
<li>Isn't it enough to give each process its own segments?
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>Translation using page tables on x86:
 | 
						|
<ul>
 | 
						|
<li>paging hardware maps linear address (la) to physical address (pa)
 | 
						|
<li>(we will often interchange "linear" and "virtual")
 | 
						|
<li>page size is 4096 bytes, so there are 1,048,576 pages in 2^32
 | 
						|
<li>why not just have a big array with each page #'s translation?
 | 
						|
<ul>
 | 
						|
<li>table[20-bit linear page #] => 20-bit phys page #
 | 
						|
</ul>
 | 
						|
<li>386 uses 2-level mapping structure
 | 
						|
<li>one page directory page, with 1024 page directory entries (PDEs)
 | 
						|
<li>up to 1024 page table pages, each with 1024 page table entries (PTEs)
 | 
						|
<li>so la has 10 bits of directory index, 10 bits table index, 12 bits offset
 | 
						|
<li>What's in a PDE or PTE?
 | 
						|
<ul>
 | 
						|
<li>20-bit phys page number, present, read/write, user/supervisor
 | 
						|
</ul>
 | 
						|
<li>cr3 register holds physical address of current page directory
 | 
						|
<li>puzzle: what do PDE read/write and user/supervisor flags mean?
 | 
						|
<li>puzzle: can supervisor read/write user pages?
 | 
						|
 | 
						|
<li>Here's how the MMU translates an la to a pa:
 | 
						|
 | 
						|
   <pre>
 | 
						|
   uint
 | 
						|
   translate (uint la, bool user, bool write)
 | 
						|
   {
 | 
						|
     uint pde; 
 | 
						|
     pde = read_mem (%CR3 + 4*(la >> 22));
 | 
						|
     access (pde, user, read);
 | 
						|
     pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
 | 
						|
     access (pte, user, read);
 | 
						|
     return (pte & 0xfffff000) + (la & 0xfff);
 | 
						|
   }
 | 
						|
 | 
						|
   // check protection. pxe is a pte or pde.
 | 
						|
   // user is true if CPL==3
 | 
						|
   void
 | 
						|
   access (uint pxe, bool user, bool write)
 | 
						|
   {
 | 
						|
     if (!(pxe & PG_P)  
 | 
						|
        => page fault -- page not present
 | 
						|
     if (!(pxe & PG_U) && user)
 | 
						|
        => page fault -- not access for user
 | 
						|
   
 | 
						|
     if (write && !(pxe & PG_W))
 | 
						|
       if (user)   
 | 
						|
          => page fault -- not writable
 | 
						|
       else if (!(pxe & PG_U))
 | 
						|
         => page fault -- not writable
 | 
						|
       else if (%CR0 & CR0_WP) 
 | 
						|
         => page fault -- not writable
 | 
						|
   }
 | 
						|
   </pre>
 | 
						|
 | 
						|
<li>CPU's TLB caches vpn => ppn mappings
 | 
						|
<li>if you change a PDE or PTE, you must flush the TLB!
 | 
						|
<ul>
 | 
						|
  <li>by re-loading cr3
 | 
						|
</ul>
 | 
						|
<li>turn on paging by setting CR0_PE bit of %cr0
 | 
						|
</ul>
 | 
						|
 | 
						|
Can we use paging to limit what memory an app can read/write?
 | 
						|
<ul>
 | 
						|
<li>user can't modify cr3 (requires privilege)
 | 
						|
<li>is that enough?
 | 
						|
<li>could user modify page tables? after all, they are in memory.
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>How we will use paging (and segments) in JOS:
 | 
						|
<ul>
 | 
						|
<li>use segments only to switch privilege level into/out of kernel
 | 
						|
<li>use paging to structure process address space
 | 
						|
<li>use paging to limit process memory access to its own address space
 | 
						|
<li>below is the JOS virtual memory map
 | 
						|
<li>why map both kernel and current process? why not 4GB for each?
 | 
						|
<li>why is the kernel at the top?
 | 
						|
<li>why map all of phys mem at the top? i.e. why multiple mappings?
 | 
						|
<li>why map page table a second time at VPT?
 | 
						|
<li>why map page table a third time at UVPT?
 | 
						|
<li>how do we switch mappings for a different process?
 | 
						|
</ul>
 | 
						|
 | 
						|
<pre>
 | 
						|
    4 Gig -------->  +------------------------------+
 | 
						|
                     |                              | RW/--
 | 
						|
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
                     :              .               :
 | 
						|
                     :              .               :
 | 
						|
                     :              .               :
 | 
						|
                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
 | 
						|
                     |                              | RW/--
 | 
						|
                     |   Remapped Physical Memory   | RW/--
 | 
						|
                     |                              | RW/--
 | 
						|
    KERNBASE ----->  +------------------------------+ 0xf0000000
 | 
						|
                     |  Cur. Page Table (Kern. RW)  | RW/--  PTSIZE
 | 
						|
    VPT,KSTACKTOP--> +------------------------------+ 0xefc00000      --+
 | 
						|
                     |         Kernel Stack         | RW/--  KSTKSIZE   |
 | 
						|
                     | - - - - - - - - - - - - - - -|                 PTSIZE
 | 
						|
                     |      Invalid Memory          | --/--             |
 | 
						|
    ULIM     ------> +------------------------------+ 0xef800000      --+
 | 
						|
                     |  Cur. Page Table (User R-)   | R-/R-  PTSIZE
 | 
						|
    UVPT      ---->  +------------------------------+ 0xef400000
 | 
						|
                     |          RO PAGES            | R-/R-  PTSIZE
 | 
						|
    UPAGES    ---->  +------------------------------+ 0xef000000
 | 
						|
                     |           RO ENVS            | R-/R-  PTSIZE
 | 
						|
 UTOP,UENVS ------>  +------------------------------+ 0xeec00000
 | 
						|
 UXSTACKTOP -/       |     User Exception Stack     | RW/RW  PGSIZE
 | 
						|
                     +------------------------------+ 0xeebff000
 | 
						|
                     |       Empty Memory           | --/--  PGSIZE
 | 
						|
    USTACKTOP  --->  +------------------------------+ 0xeebfe000
 | 
						|
                     |      Normal User Stack       | RW/RW  PGSIZE
 | 
						|
                     +------------------------------+ 0xeebfd000
 | 
						|
                     |                              |
 | 
						|
                     |                              |
 | 
						|
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
                     .                              .
 | 
						|
                     .                              .
 | 
						|
                     .                              .
 | 
						|
                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
 | 
						|
                     |     Program Data & Heap      |
 | 
						|
    UTEXT -------->  +------------------------------+ 0x00800000
 | 
						|
    PFTEMP ------->  |       Empty Memory           |        PTSIZE
 | 
						|
                     |                              |
 | 
						|
    UTEMP -------->  +------------------------------+ 0x00400000
 | 
						|
                     |       Empty Memory           |        PTSIZE
 | 
						|
    0 ------------>  +------------------------------+
 | 
						|
</pre>
 | 
						|
 | 
						|
<h3>The VPT </h3>
 | 
						|
 | 
						|
<p>Remember how the X86 translates virtual addresses into physical ones:
 | 
						|
 | 
						|
<p><img src=pagetables.png>
 | 
						|
 | 
						|
<p>CR3 points at the page directory.  The PDX part of the address
 | 
						|
indexes into the page directory to give you a page table.  The
 | 
						|
PTX part indexes into the page table to give you a page, and then
 | 
						|
you add the low bits in.
 | 
						|
 | 
						|
<p>But the processor has no concept of page directories, page tables,
 | 
						|
and pages being anything other than plain memory.  So there's nothing
 | 
						|
that says a particular page in memory can't serve as two or three of
 | 
						|
these at once.  The processor just follows pointers:
 | 
						|
 | 
						|
pd = lcr3();
 | 
						|
pt = *(pd+4*PDX);
 | 
						|
page = *(pt+4*PTX);
 | 
						|
 | 
						|
<p>Diagramatically, it starts at CR3, follows three arrows, and then stops.
 | 
						|
 | 
						|
<p>If we put a pointer into the page directory that points back to itself at
 | 
						|
index Z, as in
 | 
						|
 | 
						|
<p><img src=vpt.png>
 | 
						|
 | 
						|
<p>then when we try to translate a virtual address with PDX and PTX
 | 
						|
equal to V, following three arrows leaves us at the page directory.
 | 
						|
So that virtual page translates to the page holding the page directory.
 | 
						|
In Jos, V is 0x3BD, so the virtual address of the VPD is
 | 
						|
(0x3BD<<22)|(0x3BD<<12).
 | 
						|
 | 
						|
 | 
						|
<p>Now, if we try to translate a virtual address with PDX = V but an
 | 
						|
arbitrary PTX != V, then following three arrows from CR3 ends
 | 
						|
one level up from usual (instead of two as in the last case),
 | 
						|
which is to say in the page tables.  So the set of virtual pages
 | 
						|
with PDX=V form a 4MB region whose page contents, as far
 | 
						|
as the processor is concerned, are the page tables themselves.
 | 
						|
In Jos, V is 0x3BD so the virtual address of the VPT is (0x3BD<<22).
 | 
						|
 | 
						|
<p>So because of the "no-op" arrow we've cleverly inserted into
 | 
						|
the page directory, we've mapped the pages being used as
 | 
						|
the page directory and page table (which are normally virtually
 | 
						|
invisible) into the virtual address space.
 | 
						|
 | 
						|
 
 | 
						|
</body>
 |