CS 162, Spring 2007

Section notes for 20 February 2007

Sections 102/103

GSI: Thomas Kho <cs162-ta@inst.eecs.berkeley.edu> / <tkho@eecs>

Based on notes by Hakim Weatherspoon

AGENDA

DEADLINES

Extra OH?

Nachos Proj1
------------
• What did you like, dislike
• How hard?
• How's grouping?
• How's subversion?
• How's eclipse?
• Due dates too early?
• Group evals


Nachos 2
========

• Syscalls

   - System call is invoked when MIPS simulator does 'syscall'
     instruction and traps to the OS
     - Causes instruction processing loop to throw 'MipsException'
     - When caught, calls MipsException.handle
       - handle() writes 'cause' into machine register
       - Causes 'Machine.processor().exceptionHandler' to be invoked
       - This is set by UserKernel constructor to be
     UserKernel.exceptionHandler
       - This gets current process (casts current KThread to UThread
         and extracts process field)
       - Calls process.handleException(cause)

   - Illegal operations do NOT include anything "illegal" done while
     calling a system call -> just return -1 from system call instead
   - write/read should return -1 if cannot read/write complete memory
     buffer provided by user, but number of bytes transferred if memory
     is OK but filesystem is not

• Should NEVER crash the operating system!!
• On error either (choice not arbitrary):
• Return -1 from syscall
• Crash the PROGRAM (not the OS)

   - Use Lib.random() to get an int for LotteryScheduler: Will be
     deterministic
   - Don't introduce overflow bugs in your code, but OK to assume that
     an 'int' is good enough to do the lottery with


• File tables
• C example:
in_fd = open("input.txt");
if (in_fd == -1) { printf("oops"); exit(1); }
out_fd = creat("output.txt");
if (out_fd == -1) { printf("oops"); exit(1); }
char buf[100];
read(in_fd, buf, 100);
for (int j = 0; j<100; j++) { buf[j] ++; }
write(out_fd, buf, 100);

• Implementation:
• YOU have to implement what file descriptor to return, and how to
recognize it on read/write calls
• Table of OpenFile objects

• Never crash OS
• Sometimes return -1 - error
• Sometimes crash program (abnormal exit)

• Virtual memory
• Recap of paging: virtual address of variable 'foo' is always 0x5123.
How do you generalize so that it doesn't matter how many / which other
programs are running and using how much memory?
• Create a mapping
• Do a table-lookup each time => page table
• YOU implement page table creation & maintenance!
• We will do more next project

• Processes
• Exec: copy arguments
• Join: wait for other user process to finish, and return arguments

• C example:

• shell.c:
main() {
pid = exec ("ls", "/");
ret = join(pid);
if (ret != 0) { printf("error\n"); }
}

• ls.c:
main(int argc, char **argv) {
// ...
if (!file_exists(argv[1]))
return 1;
}

• When can you remove child processes' state?
• After join!



• Lottery scheduler
• Why "max" when you can "+" ?
• Probabilistic
• Explain lotteries

• Minimal changes to priority scheduler



• Lottery scheduler
• Do "+" instead of "max"

• In priority scheduling, always run process with highest effective
priority

• Now, do it probabilistically:

• Rename effective priority => lottery tickets

• Draw a pie chart with lottery tickets
• Throw a dart randomly

• Whichever section you hit is the process that runs for this time
slice


   - LotteryScheduler should require minimal changes to existing code


Memory
------

• It's just a huge sequence of bits!

• How to organize?

• Protect operating system from user programs

• Protect user programs from each other

• Don't want to hard-code where in memory program goes

• Base & bounds:

• Check bounds
• Add to base

• Can't access what you don't have a name for!

• Registers

• Segmentation:
• Many base & bounds segments
• -> table

• How addressing works

• Paging:
• Shrink the offset-into-segment part of address and increase the
number-of-segments => paging

• Internal vs external fragmentation


PROTECTION

What does the address space abstraction do?

     prevents multiple programs from interfering with each other.

How implemented?
    Address translation hardware
     - Each address that a program references is mapped onto a physical
       memory location
     - Maintaining different virtual->physical mappings for different
       processes prevents them from interfering

What is Dual-mode operation?
     - Must give kernel control over manipulating page tables
     - How? Processor runs in either "user" or "kernel" mode
     - Certain instructions are privileged -- e.g., access to page tables
     - Direct access to physical memory also allowed in kernel mode

PROTECTION continued
  - System calls
     - User invokes special instruction which "traps" into OS
     - Trap only allows user to invoke certain functions in the kernel
     - e.g., user places index of system call table in a register,
       invokes trap
     - Trap looks at contents of register, jumps to appropriate function
       in kernel
     - Must be executing on different stack than user! (Why? To avoid
       user manipulating return addresses on stack.)

  - Hardware interrupt
     - Cause transition from user process to kernel (that is, if
       user code is currently running)

  - Software faults
     - Division by zero
     - Segmentation violation: Accesing virtual memory address which
       is not mapped (that is, has no page table entry)
     - Page fault: Used to provide illusion of "infinite" virtual memory
       (covered later in course)

ADDRESS TRANSLATION
Assume a runtime environment with 16-bit address that accesses a larger
memory.  The upper four bits name a page, the lower twelve bits provide
an offset in a page.  Given the page table below translate the following
addresses.

|----------------------|
| VPN  | PFN   | Valid |
| Page | Value | Valid |
|----------------------|
| 0    | 0x14  | Y     |
|----------------------|
| 1    | 0x50  | Y     |
|----------------------|
| 2    | 0x52  | Y     |
|----------------------|
| 3    | 0xFE  | N     |
|----------------------|
| 4    | 0xA4  | Y     |
|----------------------|
| 5    | 0x26  | Y     |
|----------------------|
| 6    | 0x88  | Y     |
|----------------------|
| 7    | 0x68  | Y     |
|----------------------|
| 8    | 0xD4  | N     |
|----------------------|
| 9    | 0x18  | Y     |
|----------------------|
| ...  | ...   | ...   |
|----------------------|

* 0x156D = 0x5056D
* 0x6400 = 0x88400
* 0xB916 = Not shown in table.
* 0x8888 = Page fault.
* 0x7888 = 0x68888
* 0x23F4 = 0x523F4
* 0x3A6B = Page fault.

REVIEW - PROTECTION
 - The motivation for protection is multiprogramming. The goal is
   to keep user programs from corrupting the OS or each other.
 
 - Can be implemented in SW or HW
   - SW support via strong typing or software fault isolation.
   - HW support requires both
    - address translation ( base and limit registers, ... )
    - dual mode operation: kernel vs user mode
        - mode bit
            - priviledged instructions (ie system services)


ADDRESS TRANSLATION
-----------------------------------------------------------------------------
    TYPES OF ADDRESS TRANSLATION
-----------------------------------------------------------------------------
  - Base and Bounds
  - Segmentation
  - Paging

-----------------------------------------------------------------------------
    SEGMENTATION
-----------------------------------------------------------------------------
 - Segments are variable length contiguous regions of memory.
   Usally associated with the logical partition of virtual address
   space (ie code, data, stack)
 - Segment table has all segment table entries for one process.
 - Each segment table entry has physical segment number, bounds,
    and protection bits
 - Virtual address: <virtual segment number, byte within segment>
    - the virtual segment number indexes into the segment table
 - HW provides STBR, segment table, checking protection bits

-----------------------------------------------------------------------------
    PAGING
-----------------------------------------------------------------------------

 - Pages are fixed length small regions of memory.  Usually about 4K in size.
 - Page table has all page table entries for one process
 - Each page table entry has physical page number and protection bits
 - Virtual address: <virtual page number, byte within page>
    - the virtual page number indexes into the page table
 - HW provides PTBR, PT size, page table, checking protection bits

-  Paging vs. Segmentation
  - Fixed page size is the key to paging over segmentation
  - Solves segmentation problem: If we need to allocate space for a new
    page, any empty page frame will do
  - A large object does not need to be contiguous
  - Cons:
    - If page size too large, get internal fragmentation (object that
      needs one byte more than size of page takes 2 pages)
    - If page size too small, too many page table entries

- How big are single-level page tables?
  - Say we have 32-bit address with 4Kb pages. How many page table
    entries are there?
    - Answer: 20 bits of page address, so 2^20 or 1024*1024 entries
  - How big is a page table entry?
    - Minimum size is enough to span physical memory. So with 512 MBytes
      in your machine, how many bits is the physical page frame number
      in the page table entry?
      - Answer: 29 bits of physical address space (512 MBytes), divided
    into 4 KB pages -> 12 bits of offset, 17 bits of page address
    So, need 17 bits to store physical page frame number
    - Problem is that we don't want to hard-wire the amount of physical
      memory the machine might have!
    - In general, on a 32-bit machine can have 32-bit physical
      addresses. So, with 4KB pages need 12 bits for offset and 20 bits
      for page address, just like virtual addresses
  - So how big is the page table then?
     1024 * 1024 * 20 bits = 2.5 MBytes, all in physical memory!!
  - AND THAT'S FOR EACH PROCESS!
  - Note also that each page table entry also contains some accounting
    and status bits, for example, indicating whether a given page is
    read-only, whether it has been modified, etc. So on a 32-bit machine
    a page table entry tends to be 32 bits wide, which means
    1024 * 1024 * 32 = 4 MBytes.

-----------------------------------------------------------------------------
    MULTI-LEVEL PAGE TABLES
-----------------------------------------------------------------------------
- Multilevel paging
  - Solves the problem by using a hierarchy of page tables
    (Show picture of tree of page tables)
  - Since at any given time a process is only using a small portion of
    its address space, only need page table entries corresponding to
    what's actually in use
  - Note that later on we will talk about DEMAND PAGING: Moving memory to and
    from disk
    - This is going to allow us to allocate more virtual memory than we
      actually have physical RAM
    - With multilevel page tables, the PAGE TABLES can be PAGED -
      meaning that if the process has a portion of its address space
      which hasn't been used in a long time, those page tables (and the
      associated pages) go out to disk
    - This is project phase 3, so pay attention :-)

- Multilevel paging on the x86
  - Uses two-level page tables
  - Virtual address is divided into 3 components:
      10 bits (31 .. 22): Page directory index
      10 bits (21 .. 12): Page table index
       10 bits (11 .. 0): Offset

  - Example: Virtual address
      0x13a49F01 = 0001001110 | 1001001001 | 111100000001 in binary
                     0x04e    |   0x249    |    0xF01    
                    pgdirind. |  pgtblind. |    offset
 
  - First, use page dir index as index into "page directory"
    Physical base address of the page directory is stored in a special
    register (called "CR3") on the x86. It is always page-aligned, so
    the base address might be 0x0001c000.

    How many entries in the page directory? Well 10 bits of index, so
    2^10 entries * 4 bytes each entry = 4Kb (One page!)

    At offset 0x04e in the page directory we find the physical address of
    page table, e.g.,: 0x000a2000. (Note that the page table is also
    page-aligned and holds just one page.)
   
  - We use the "page table index" as an index into this second-level
    table. At index 0x249 into the page table at 0xa2000 we find the
    physical page number of the address we are looking for, e.g.,
    0x0341b000. We add to this the offset (0xf01) to get the final
    physical address: 0x0341bf01.

  - New twist: What if the address of the page table is a VIRTUAL
    ADDRESS, which means we have to go through another step of
    translation to look up its physical address!
    - This means that to simply look up one virtual address, we may have
      to do a SECOND virtual address translation
    - Can this process go on forever? What if the SECOND translation
      above turns into a THIRD translation, and so forth? Pretty
      complex!

  - Generally the page directory entries contain PHYSICAL addresses for
    page tables. However, the least significant bit of the entry
    indicates whether the page table is "present" or not. If not
    present, then the entry would contain an OS-specific pointer (such
    as a disk block address) used to record where the page table is
    stored on disk. In this way, page directory entries always use
    PHYSICAL addresses, but the page tables can still be swapped out.

  - Note that the page directory is always resident in physical memory,
    meaning that each process has 1 page always "pinned"

PAGE REPLACEMENT POLICY

3 pages of physical mem, 5 pages of virtual mem
(from different processes, say): ABCDE
Access pattern is ABCDDEECBADECBA

With FIFO policy:

Phys addr   A B C D D E E C B A D E C B A
1           A     D           A     C
2             B       E         D     B
3               C           B     E     A

How many page faults?  12! What is hit rate? 3/15 !
Only 3 out of 15 of our accesses did not necessitate a page fault.

How many compulsory misses? 5, since we had to bring in each page
at least one time during the string of accesses.

How many capacity and conflict misses? The problem is that capacity
and conflict misses assume that we had an optimal replacement strategy
to begin with, and that we COULD NOT AVOID those misses regardless.
Since FIFO is hardly optimal, then probably best to characterize all
of these misses as "policy" misses. (Note that on a quiz or final exam,
it still might ask you for capacity/conflict misses for a non-optimal
strategy. Technically the answer depends on whether the misses were
caused by lack of capacity (no place to put the new page) or conflict
(the place you want to put the new page is the same as another page
due to limited associativity).

How many policy misses? In this case, we have 7 -- all of the misses
except for the first 5.

With MIN policy:

Phys addr   A B C D D E E C B A D E C B A
1           A     D   E               B <----- Could go anywhere
2             B               A D       A <--- Could go anywhere
3               C         

How many page faults? 9
How many compulsory misses? 5
How many capacity misses? Since this is an "optimal" policy we
  can talk about that. If we had an infinite physical memory then we would
  have no capacity misses. But since we have only 3 pages, we have to miss
  for A, D, B, and A -- in other words, 4 misses.
How many conflict misses? None, really, since the reason we miss the
  cache has to do with capacity rather than limited associativity.
How many policy misses? None - see above.

With LRU policy:

Phys addr   A B C D D E E C B A D E C B A
1           A     D         B     E     A
2             B       E       A     C 
3               C               D     B 

This is similar to FIFO! Not surprising, since LRU replaces the
page that was referenced the longest time in the past. So, LRU will
behave badly when the next reference is to the least recently used
page.  Note that this is not going to NECESSARILY be the same as
FIFO - it depends on the access pattern.

Extra Problem with Clock policy:

Phys addr  A C B D B A E F B F A G E F A
1          A           E
2         C           F
3           B                 G   
4                D             A




CACHING and TLBs
  - TLB, Translation Lookaside Buffer. Caching applied to
    address translation.
  - Direct Mapped - restrict each virtual page to use specific TLB entry
  - Set Associative - parition TLB into N separate banks. Select entry with low
    order virtual page number bits. Compare all N entries in parallel.
  - Fully Associative - Compare entire TLB in parallel. Only one entry per bank.

EFFECTIVE ACCESS TIME IN TLB
    = P(hit) * cost of hit + P(miss) * cost of miss

EX: 20 ns ==> TLB access
    100 ns ==> Memory access
    P(hit) = .80

    EAT = .80(120) + .20(220) = 140 ns

DEMAND PAGING
  - Each page-table entry contains a 'valid' bit which
    indicates that the associated page is BOTH legal
    AND in memory. On x86 is low-order bit; if 0, valid,
    if 1, not valid.
  - If not valid and legal, page table entry holds some OS-specific
    information to maintain location of page on disk. Generally will be
    a disk block number
  - When page table entry is not valid, check if the address
    is both legal and memory resident.
  - If illegal address, terminate process.
  - If legal address, do pagefault
    - allocate new page, read from disk, restart faulting instruction

  - NOTE: Page tables themselves can be paged!
    - Recall two-level page table structure
    - Top-level table always present
    - Second-level tables may be swapped out...

PAGE REPLACEMENT

  - Random - solution for TLBs. Implemented in hardware.
  - FIFO - Replace oldest page. Bad because removes hot pages. 
  - MIN - Replace the page that will not be used for the longest
      period of time. Optimal algorithm. Requires future knowledge.
  - LRU - Replace the page that has not been used for the longest
      period of time. Approximates MIN.

LRU APPROXIMATION ALGORITHMS
  - CLOCK - Relace an old page. Hardware  keeps 'use' bit per physical frame.
  - N Chance - Clock algorithm with 'use' bit and counter.