1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
CS122 Assignment 7 - Write-Ahead Logging - Design Document
==========================================================
A: Write-Ahead Logging
-----------------------
A1. One of your tasks this week was to implement the TransactionManager's
forceWAL(LogSequenceNumber) method. This method must perform its
operation atomically and durably, so that if a crash occurs during this
method, the WAL will always be a reliable record of database operations.
How did you ensure that your implementation satisfies these constraints?
Justify your approach. (You can assume that the underlying OS provides
an atomic file-sync operation, and that writing a single sector will
also be atomic with the obvious caveat that the written sector may still
be buffered until a sync occurs.)
A2: Another task was to implement the beforeWriteDirtyPages() method on the
TransactionManager. Your implementation must ensure that the write-ahead
logging rule is always followed. What steps do you take to ensure this
will always happen? Describe your method's approach.
A3: Actually, if you think about it, NanoDB writes to data pages before it
emits WAL records describing the changes to those pages. At a higher level
than for the previous question, explain how your implementation ensures
that NanoDB follows the WAL rule e.g. when a new tuple is being inserted or
updated on a heap file's data page. Your answer should touch on all major
subsystems that are involved in this operation.
A4: In your current implementation, some pages may not have corresponding
LSNs associated with them, because they are not logged in the write-ahead
log. Enumerate all file types that will have pages not logged in the
WAL.
B: The txnstate.dat File
-------------------------
B1. The txnstate.dat file records the next transaction ID that the database
should use when it is restarted. Why is it important for this to be
stored and used by the database when it is restarted?
B2: The txnstate.dat file records a "firstLSN" value, which is where recovery
processing starts from. What guarantees are made about this firstLSN
value? Given these guarantees, will redo processing need any records
before the firstLSN value? Will undo processing need any records before
the firstLSN value? Justify your answers.
B3: Currently, the "firstLSN" value is only moved forward when recovery
processing is completed. Can you describe a strategy for moving forward
firstLSN during normal operation? What constraints must be enforced to
ensure the database continues working properly? Explain your answers.
B4: The txnstate.dat file's "firstLSN" value is somewhat similar to a
checkpoint / fuzzy-checkpoint, but it is not quite the same. Describe
the differences between what NanoDB provides, and how checkpoints
generally work, focusing on what constraints must be enforced during the
checkpointing operation, vs. the constraints that NanoDB must enforce
with firstLSN.
C: Testing
-----------
C1: Did you run into any fun, surprising or crazy bugs while you were
debugging your transaction-processing code? (It's OK if your answer
is "no," although Donnie will be dubious...)
D: Extra Credit [OPTIONAL]
---------------------------
If you implemented any extra-credit tasks for this assignment, describe
them here. The description should be like this, with stuff in "<>" replaced.
(The value i starts at 1 and increments...)
D<i>: <one-line description>
<brief summary of what you did, including the specific classes that
we should look at for your implementation>
<brief summary of test-cases that demonstrate/exercise your extra work>
E: Feedback [OPTIONAL]
-----------------------
WE NEED YOUR FEEDBACK! Thoughtful and constructive input will help us to
improve future versions of the course. These questions are OPTIONAL, and
your answers will not affect your grade in any way (including if you hate
everything about the assignment and databases in general, or Donnie and/or
the TAs in particular). Feel free to answer as many or as few of them as
you wish.
E1. What parts of the assignment were most time-consuming? Why?
E2. Did you find any parts of the assignment particularly instructive?
Correspondingly, did any parts feel like unnecessary busy-work?
E3. Did you particularly enjoy any parts of the assignment? Were there
any parts that you particularly disliked?
E4. Were there any critical details that you wish had been provided with the
assignment, that we should consider including in subsequent versions of
the assignment?
E5. Do you have any other suggestions for how future versions of the
assignment can be improved?