xv6-riscv-kernel/labs/fs.html
Frans Kaashoek fdea265489 two notes
2019-08-02 16:42:55 -04:00

360 lines
11 KiB
HTML

<html>
<head>
<title>Lab: file system</title>
<link rel="stylesheet" href="homework.css" type="text/css" />
</head>
<body>
<h1>Lab: file system</h1>
<p>In this lab you will add large files and <tt>mmap</tt> to the xv6 file system.
<h2>Large files</h2>
<p>In this assignment you'll increase the maximum size of an xv6
file. Currently xv6 files are limited to 268 blocks, or 268*BSIZE
bytes (BSIZE is 1024 in xv6). This limit comes from the fact that an
xv6 inode contains 12 "direct" block numbers and one "singly-indirect"
block number, which refers to a block that holds up to 256 more block
numbers, for a total of 12+256=268. You'll change the xv6 file system
code to support a "doubly-indirect" block in each inode, containing
256 addresses of singly-indirect blocks, each of which can contain up
to 256 addresses of data blocks. The result will be that a file will
be able to consist of up to 256*256+256+11 blocks (11 instead of 12,
because we will sacrifice one of the direct block numbers for the
double-indirect block).
<h3>Preliminaries</h3>
<p>Modify your Makefile's <tt>CPUS</tt> definition so that it reads:
<pre>
CPUS := 1
</pre>
<b>XXX doesn't seem to speedup things</b>
<p>Add
<pre>
QEMUEXTRA = -snapshot
</pre>
right before
<tt>QEMUOPTS</tt>
<p>
The above two steps speed up qemu tremendously when xv6
creates large files.
<p><tt>mkfs</tt> initializes the file system to have fewer
than 1000 free data blocks, too few to show off the changes
you'll make. Modify <tt>param.h</tt> to
set <tt>FSSIZE</tt> to:
<pre>
#define FSSIZE 20000 // size of file system in blocks
</pre>
<p>Download <a href="big.c">big.c</a> into your xv6 directory,
add it to the UPROGS list, start up xv6, and run <tt>big</tt>.
It creates as big a file as xv6 will let
it, and reports the resulting size. It should say 140 sectors.
<h3>What to Look At</h3>
The format of an on-disk inode is defined by <tt>struct dinode</tt>
in <tt>fs.h</tt>. You're particularly interested in <tt>NDIRECT</tt>,
<tt>NINDIRECT</tt>, <tt>MAXFILE</tt>, and the <tt>addrs[]</tt> element
of <tt>struct dinode</tt>. Look Figure 7.3 in the xv6 text for a
diagram of the standard xv6 inode.
<p>
The code that finds a file's data on disk is in <tt>bmap()</tt>
in <tt>fs.c</tt>. Have a look at it and make sure you understand
what it's doing. <tt>bmap()</tt> is called both when reading and
writing a file. When writing, <tt>bmap()</tt> allocates new
blocks as needed to hold file content, as well as allocating
an indirect block if needed to hold block addresses.
<p>
<tt>bmap()</tt> deals with two kinds of block numbers. The <tt>bn</tt>
argument is a "logical block" -- a block number relative to the start
of the file. The block numbers in <tt>ip->addrs[]</tt>, and the
argument to <tt>bread()</tt>, are disk block numbers.
You can view <tt>bmap()</tt> as mapping a file's logical
block numbers into disk block numbers.
<h3>Your Job</h3>
Modify <tt>bmap()</tt> so that it implements a doubly-indirect
block, in addition to direct blocks and a singly-indirect block.
You'll have to have only 11 direct blocks, rather than 12,
to make room for your new doubly-indirect block; you're
not allowed to change the size of an on-disk inode.
The first 11 elements of <tt>ip->addrs[]</tt> should be
direct blocks; the 12th should be a singly-indirect block
(just like the current one); the 13th should be your new
doubly-indirect block.
<p>
You don't have to modify xv6 to handle deletion of files with
doubly-indirect blocks.
<p>
If all goes well, <tt>big</tt> will now report that it
can write sectors. It will take <tt>big</tt> minutes
to finish.
<b>XXX this runs for a while!</b>
<h3>Hints</h3>
<p>
Make sure you understand <tt>bmap()</tt>. Write out a diagram of the
relationships between <tt>ip->addrs[]</tt>, the indirect block, the
doubly-indirect block and the singly-indirect blocks it points to, and
data blocks. Make sure you understand why adding a doubly-indirect
block increases the maximum file size by 256*256 blocks (really -1),
since you have to decrease the number of direct blocks by one).
<p>
Think about how you'll index the doubly-indirect block, and
the indirect blocks it points to, with the logical block
number.
<p>If you change the definition of <tt>NDIRECT</tt>, you'll
probably have to change the size of <tt>addrs[]</tt>
in <tt>struct inode</tt> in <tt>file.h</tt>. Make sure that
<tt>struct inode</tt> and <tt>struct dinode</tt> have the
same number of elements in their <tt>addrs[]</tt> arrays.
<p>If you change the definition of <tt>NDIRECT</tt>, make sure to create a
new <tt>fs.img</tt>, since <tt>mkfs</tt> uses <tt>NDIRECT</tt> too to build the
initial file systems. If you delete <tt>fs.img</tt>, <tt>make</tt> on Unix (not
xv6) will build a new one for you.
<p>If your file system gets into a bad state, perhaps by crashing,
delete <tt>fs.img</tt> (do this from Unix, not xv6). <tt>make</tt> will build a
new clean file system image for you.
<p>Don't forget to <tt>brelse()</tt> each block that you
<tt>bread()</tt>.
<p>You should allocate indirect blocks and doubly-indirect
blocks only as needed, like the original <tt>bmap()</tt>.
<p>Optional challenge: support triple-indirect blocks.
<h2>Writing with a Log</h2>
Insert a print statement in bwrite (in bio.c) so that you get a
print every time a block is written to disk:
<pre>
printf("bwrite block %d\n", b->blockno);
</pre>
Build and boot a new kernel and run this:
<pre>
$ rm README
</pre>
<p>You should see a sequence of bwrite prints after the <tt>rm</tt>.</p>
<div class="question">
<ol>
<li>Annotate the bwrite lines with the kind of information that is
being written to the disk (e.g., "README's inode", "allocation
bitmap"). If the log is being written, note both that the log is being
written and also what kind of information is being written to the log.
<li>Mark with an arrow the first point at which, if a
crash occured, README would be missing after a reboot
(after the call to <tt>recover_from_log()</tt>).
</ol>
</p>
</div>
<h2>Crash safety</h2>
<p>This assignment explores the xv6 log in two parts.
First, you'll artificially create a crash which illustrates
why logging is needed. Second, you'll remove one
inefficiency in the xv6 logging system.
<p>
Submit your solution before the beginning of the next lecture
to <a href="https://6828.scripts.mit.edu/2018/handin.py/">the submission
web site</a>.
<h3>Creating a Problem</h3>
<p>
The point of the xv6 log is to cause all the disk updates of a
filesystem operation to be atomic with respect to crashes.
For example, file creation involves both adding a new entry
to a directory and marking the new file's inode as in-use.
A crash that happened after one but before the other would
leave the file system in an incorrect state after a reboot,
if there were no log.
<p>
The following steps will break the logging code in a way that
leaves a file partially created.
<p>
First, replace <tt>commit()</tt> in <tt>log.c</tt> with
this code:
<pre>
#include "kernel/proc.h"
void
commit(void)
{
int pid = myproc()->pid;
if (log.lh.n > 0) {
write_log();
write_head();
if(pid > 1) // AAA
log.lh.block[0] = 0; // BBB
install_trans();
if(pid > 1) // AAA
panic("commit mimicking crash"); // CCC
log.lh.n = 0;
write_head();
}
}
</pre>
<p>
The BBB line causes the first block in the log to be written to
block zero, rather than wherever it should be written. During file
creation, the first block in the log is the new file's inode updated
to have non-zero <tt>type</tt>.
Line BBB causes the block
with the updated inode to be written to block 0 (whence
it will never be read), leaving the on-disk inode still marked
unallocated. The CCC line forces a crash.
The AAA lines suppress this buggy behavior for <tt>init</tt>,
which creates files before the shell starts.
<p>
Second, replace <tt>recover_from_log()</tt> in <tt>log.c</tt>
with this code:
<pre>
static void
recover_from_log(void)
{
read_head();
printf("recovery: n=%d but ignoring\n", log.lh.n);
// install_trans();
log.lh.n = 0;
// write_head();
}
</pre>
<p>
This modification suppresses log recovery (which would repair
the damage caused by your change to <tt>commit()</tt>).
<p>
Finally, remove the <tt>-snapshot</tt> option from the definition
of <tt>QEMUEXTRA</tt> in your Makefile so that the disk image will see the
changes.
<p>
Now remove <tt>fs.img</tt> and run xv6:
<pre>
% rm fs.img ; make qemu
</pre>
<p>
Tell the xv6 shell to create a file:
<pre>
$ echo hi > a
</pre>
<p>
You should see the panic from <tt>commit()</tt>. So far
it is as if a crash occurred in a non-logging system in the middle
of creating a file.
<p>
Now re-start xv6, keeping the same <tt>fs.img</tt>:
<pre>
% make qemu
</pre>
<p>
And look at file <tt>a</tt>:
<pre>
$ cat a
</pre>
<p>
You should see <tt>panic: ilock: no type</tt>. Make sure you understand what happened.
Which of the file creation's modifications were written to the disk
before the crash, and which were not?
<h3>Solving the Problem</h3>
Now fix <tt>recover_from_log()</tt>:
<pre>
static void
recover_from_log(void)
{
read_head();
cprintf("recovery: n=%d\n", log.lh.n);
install_trans();
log.lh.n = 0;
write_head();
}
</pre>
<p>
Run xv6 (keeping the same <tt>fs.img</tt>) and read <tt>a</tt> again:
<pre>
$ cat a
</pre>
<p>
This time there should be no crash. Make sure you understand why
the file system now works.
<p>
Why was the file empty, even though you created
it with <tt>echo&nbsp;hi&nbsp;>&nbsp;a</tt>?
<p>
Now remove your modifications to <tt>commit()</tt>
(the if's and the AAA and BBB lines), so that logging works again,
and remove <tt>fs.img</tt>.
<h3>Streamlining Commit</h3>
<p>
Suppose the file system code wants to update an inode in block 33.
The file system code will call <tt>bp=bread(block 33)</tt> and update the
buffer data. <tt>write_log()</tt> in <tt>commit()</tt>
will copy the data to a block in the log on disk, for example block 3.
A bit later in <tt>commit</tt>, <tt>install_trans()</tt> reads
block 3 from the log (containing block 33), copies its contents into the in-memory
buffer for block 33, and then writes that buffer to block 33 on the disk.
<p>
However, in <tt>install_trans()</tt>, it turns out that the modified
block 33 is guaranteed to be still in the buffer cache, where the
file system code left it. Make sure you understand why it would be a
mistake for the buffer cache to evict block 33 from the buffer cache
before the commit.
<p>
Since the modified block 33 is guaranteed to already be in the buffer
cache, there's no need for <tt>install_trans()</tt> to read block
33 from the log. Your job: modify <tt>log.c</tt> so that, when
<tt>install_trans()</tt> is called from <tt>commit()</tt>,
<tt>install_trans()</tt> does not perform the needless read from the log.
<p>To test your changes, create a file in xv6, restart, and make sure
the file is still there.
<b>XXX Does this speedup bigfile?</b>
<b>XXX Maybe support lseek and modify shell to append to a file?</b>
</body>
</html>