lab7design.txt 4.75 KB
CS122 Assignment 7 - Write-Ahead Logging - Design Document
==========================================================

A:  Write-Ahead Logging
-----------------------

A1.  One of your tasks this week was to implement the TransactionManager's
     forceWAL(LogSequenceNumber) method.  This method must perform its
     operation atomically and durably, so that if a crash occurs during this
     method, the WAL will always be a reliable record of database operations.
     How did you ensure that your implementation satisfies these constraints?
     Justify your approach.  (You can assume that the underlying OS provides
     an atomic file-sync operation, and that writing a single sector will
     also be atomic with the obvious caveat that the written sector may still
     be buffered until a sync occurs.)

A2:  Another task was to implement the beforeWriteDirtyPages() method on the
     TransactionManager.  Your implementation must ensure that the write-ahead
     logging rule is always followed.  What steps do you take to ensure this
     will always happen?  Describe your method's approach.

A3:  Actually, if you think about it, NanoDB writes to data pages before it
     emits WAL records describing the changes to those pages.  At a higher level
     than for the previous question, explain how your implementation ensures
     that NanoDB follows the WAL rule e.g. when a new tuple is being inserted or
     updated on a heap file's data page.  Your answer should touch on all major
     subsystems that are involved in this operation.

A4:  In your current implementation, some pages may not have corresponding
     LSNs associated with them, because they are not logged in the write-ahead
     log.  Enumerate all file types that will have pages not logged in the
     WAL.

B:  The txnstate.dat File
-------------------------

B1.  The txnstate.dat file records the next transaction ID that the database
     should use when it is restarted.  Why is it important for this to be
     stored and used by the database when it is restarted?

B2:  The txnstate.dat file records a "firstLSN" value, which is where recovery
     processing starts from.  What guarantees are made about this firstLSN
     value?  Given these guarantees, will redo processing need any records
     before the firstLSN value?  Will undo processing need any records before
     the firstLSN value?  Justify your answers.

B3:  Currently, the "firstLSN" value is only moved forward when recovery
     processing is completed.  Can you describe a strategy for moving forward
     firstLSN during normal operation?  What constraints must be enforced to
     ensure the database continues working properly?  Explain your answers.

B4:  The txnstate.dat file's "firstLSN" value is somewhat similar to a
     checkpoint / fuzzy-checkpoint, but it is not quite the same.  Describe
     the differences between what NanoDB provides, and how checkpoints
     generally work, focusing on what constraints must be enforced during the
     checkpointing operation, vs. the constraints that NanoDB must enforce
     with firstLSN.

C:  Testing
-----------

C1:  Did you run into any fun, surprising or crazy bugs while you were
     debugging your transaction-processing code?  (It's OK if your answer
     is "no," although Donnie will be dubious...)

D:  Extra Credit [OPTIONAL]
---------------------------

If you implemented any extra-credit tasks for this assignment, describe
them here.  The description should be like this, with stuff in "<>" replaced.
(The value i starts at 1 and increments...)

D<i>:  <one-line description>

     <brief summary of what you did, including the specific classes that
     we should look at for your implementation>

     <brief summary of test-cases that demonstrate/exercise your extra work>

E:  Feedback [OPTIONAL]
-----------------------

WE NEED YOUR FEEDBACK!  Thoughtful and constructive input will help us to
improve future versions of the course.  These questions are OPTIONAL, and
your answers will not affect your grade in any way (including if you hate
everything about the assignment and databases in general, or Donnie and/or
the TAs in particular).  Feel free to answer as many or as few of them as
you wish.

E1.  What parts of the assignment were most time-consuming?  Why?

E2.  Did you find any parts of the assignment particularly instructive?
     Correspondingly, did any parts feel like unnecessary busy-work?

E3.  Did you particularly enjoy any parts of the assignment?  Were there
     any parts that you particularly disliked?

E4.  Were there any critical details that you wish had been provided with the
     assignment, that we should consider including in subsequent versions of
     the assignment?

E5.  Do you have any other suggestions for how future versions of the
     assignment can be improved?