Commit ecfb12b9 authored by Caleb C. Sander's avatar Caleb C. Sander
Browse files

Add MiniVC spec and async-await notes

parent 3b66e32d
No related merge requests found
Showing with 606 additions and 4 deletions
+606 -4
......@@ -35,7 +35,7 @@ You can contact me via:
| 5-6 | [Streams](notes/streams/streams.md) | [`grep`](specs/grep/grep.md) | ~~2020-05-08~~ 2020-05-15 |
| 7 | [HTTP](notes/http/http.md) | [Wiki Game](specs/wiki-game/wiki-game.md) | ~~2020-05-15~~ 2020-05-22 |
| ~~7~~ | ~~[WebSockets](notes/websockets/websockets.md)~~ | ~~[Chat client](specs/chat/client.md) **OR** [Chat server](specs/chat/server.md)~~ | ~~2020-05-22~~ |
| 8-9 | `async`-`await` | MiniVC | 2020-06-05 |
| 8-9 | [`async`-`await`](notes/async-await/async-await.md) | [MiniVC](specs/mini-vc/mini-vc.md) | 2020-06-05 |
## JavaScript reference
......
# `async`-`await`
## Recap of `Promise`s
JavaScript's `Promise` abstraction was discussed in the [notes](../promises/promises.md) near the start of the course and you had a chance to use it on the Make project.
I'l give a quick refresher since `Promise`s are essential to the `async`-`await` syntax.
A `Promise` represents an asynchronous task or computation.
When the task or computation finishes, the `Promise` "resolves" to a value.
If an error occurs, the `Promise` "rejects" with the error.
The usefulness of `Promises` comes from the ways they can be composed into more complex `Promise`s.
There are two main ways to combine `Promise`s:
- In sequence.
`.then()` chains two `Promise`s into a `Promise` that performs the asynchronous tasks or computations one after another.
You can build long chains by calling `.then()` more times:
```js
// Build a Promise that performs 4 asynchronous computations in sequence
promiseA
.then(a => { /* build and return promiseB... */ })
.then(b => { /* build and return promiseC... */ })
.then(c => { /* build and return promiseD... */ })
```
- In parallel.
`Promise.all()` combines an array of `Promise`s into a `Promise` that performs all the tasks or computations in parallel and resolves to an array containing all their results:
```js
// Build a promise that performs 3 asynchronous computations in parallel
// and resolves with their results once they have all finished
Promise.all([promiseA, promiseB, promiseC])
```
## `async`-`await` syntax
`Promise`s can greatly simplify asynchronous programs.
Frequently, several asynchronous tasks need to happen in sequence.
In this case, `.then()` provides an interface that looks closer to how the program would be written in a synchronous language.
Recently, new syntax was added to JavaScript that makes asynchronous code look even more like synchronous code.
It relies on two keywords:
- `async`: we can declare an "async function" by putting `async` in front of it:
```js
async function name(...arguments) {
/* ... */
}
// or
async (...arguments) => { /* ... */ }
```
Unlike normal functions, `async` functions return `Promise`s.
If the function `return`s a value, the `Promise` resolves with that value.
If the function `throw`s an error, the `Promise` rejects with that error.
- `await`: inside an `async` function, the keyword `await` can be used to "wait" for a `Promise` to resolve.
If the `Promise` throws an error, it can be caught by a `try`-`catch` block.
It is easiest to explain `async`-`await` with some examples.
## Example: recursively exploring a directory
We can rewrite the code from the `Promise`s notes for [computing the modification time of a directory](../promises/promises.md#promise.all).
Recall that we want to find the latest modification time of any file inside the directory (or any subdirectories).
By making `modifiedTime()` an `async` function, we can use `await` instead of explicit `.then()` calls on `Promise`s.
You can compare this [`async`-`await` version](await-mtime.js) with the [`Promise`s version](../promises/recursive-mtime.js).
```js
// Returns a Promise<number> that computes the most recent
// modification time for any file inside a directory
async function modifiedTime(path) {
// First, we wait for the fs.stat() Promise
const stats = await fs.stat(path)
// Use the modification time of the file or directory itself
let latestTime = stats.mtimeMs
// If this is a directory, check all files inside for later modified times
if (stats.isDirectory()) {
// Wait to get a list of all files/subdirectories in the directory
const files = await fs.readdir(path)
// Then get the modified time of each file/subdirectory
for (const file of files) {
const fileModifiedTime = await modifiedTime(path + '/' + file)
// update latestTime if the modification time is later
latestTime = Math.max(latestTime, fileModifiedTime)
}
}
// Return the latest modification time
return latestTime
}
```
However, this code no longer explores subdirectories in parallel, which was the main advantage of asynchronicity.
Can you see why?
The issue is that we have replaced `Promise.all(files.map(file => /* ... */))` with a `for (const file of files)` loop.
Since `await modifiedTime(path + '/' + file)` *waits* for `modifiedTime()` to finish, the loop waits for `file` to be completely explored before continuing.
This means the subdirectories are explored sequentially.
If we want to perform asynchronous tasks in parallel inside an `async` function, **we still need to use `Promise.all()`**.
[Here](await-mtime-parallel.js) is a parallel version using `Promise.all()`:
```js
async function modifiedTime(path) {
const stats = await fs.stat(path)
const {mtimeMs} = stats
// If this path is a file, return its modification time
if (!stats.isDirectory()) return mtimeMs
// If this is a directory, check all files inside for later modified times
const files = await fs.readdir(path)
// Get the modified time of each file/subdirectory in parallel
const modifiedTimes = await Promise.all(files.map(file =>
modifiedTime(path + '/' + file)
))
// Return the latest modification time
return Math.max(mtimeMs, ...modifiedTimes)
}
```
## Example: emoji search
The [Unicode Consortium](https://home.unicode.org) defines the mapping from numbers to characters (including emoji) that almost everyone uses.
For example, Unicode defines 65 to mean `A`, 233 to mean `é`, and 128027 to mean `🐛`.
We use this mapping to make an [emoji search](await-emoji.js) that finds a Unicode character by name (e.g. `pile of poo`) and prints out the character.
We first fetch the current Unicode version (updated each year) from Unicode's [`ReadMe.txt`](https://unicode.org/Public/zipped/latest/ReadMe.txt) file.
We then fetch the current [`UnicodeData.txt`](https://www.unicode.org/Public/13.0.0/ucd/UnicodeData.txt) file which lists the tens of thousands of Unicode characters.
Note how `await` allows us to execute 4 `Promise`s sequentially in `getUnicodeNumber()` without needing to combine them with `.then()`.
```js
async function getUnicodeNumber(searchName) {
// Fetch the readme to determine the latest Unicode version (e.g. 13.0.0)
const readmeResponse = await fetch(README_URL)
const readme = await readmeResponse.text() // read readme as a string
// The last line of the readme contains the current Unicode URL,
// e.g. https://www.unicode.org/Public/13.0.0/ucd/
const [latestURL] = readme.trim().split('\n').slice(-1)
// Then fetch the list of Unicode characters
const charactersResponse = await fetch(latestURL + UNICODE_FILE)
const charactersData = await charactersResponse.text()
// Each line of the data file corresponds to one Unicode character
for (const characterData of charactersData.split('\n')) {
// Each line has several fields separated by `;`.
// The first is the character's number and the second is its name.
const [characterNumber, characterName] = characterData.split(';')
if (characterName.toLowerCase() === searchName) {
// If this is the requested character, read its number in base-16
return parseInt(characterNumber, 16)
}
}
return undefined
}
async function printUnicode(searchName) {
// getUnicodeNumber() returns a Promise, so we need to await it
const number = await getUnicodeNumber(searchName)
// Convert the Unicode number to a character
if (number !== undefined) console.log(String.fromCodePoint(number))
}
```
(`fetch()` turns an HTTP/HTTPS request into a `Promise`, as explained in the [HTTP notes](../http/http.md#aside-fetch).)
## Example: create files without overwriting existing files
When an `await`ed `Promise` rejects with an error, you can catch it using a `try`-`catch` block, just like a normal JavaScript error.
For [example](await-no-overwrite.js), suppose we want to create some files but not overwrite any existing files with the same name.
We can use the `wx` [flag](https://nodejs.org/api/fs.html#fs_file_system_flags) to make [`fs.writeFile()`](https://nodejs.org/api/fs.html#fs_fspromises_writefile_file_data_options) reject if a file already exists.
We want either all or none of the files to be created, so if one already exists, we stop and remove all the files that were already created.
```js
// Creates an array of files. If any file already exists,
// no files are created and an error is thrown.
async function writeFiles(files) {
for (let i = 0; i < files.length; i++) {
const file = files[i]
try {
// Write to the file, but throw an error if the file exists
await fs.writeFile(file.name, file.contents, {flag: 'wx'})
}
catch (e) {
// File already exists, so remove all the previously created files
await Promise.all(files.slice(0, i).map(file =>
fs.unlink(file.name)
))
// Re-throw the error
throw e
}
}
}
// Try to create 3 files
writeFiles([
{name: 'a.txt', contents: 'abc'},
{name: 'b.txt', contents: '123'},
{name: 'c.txt', contents: 'xyz'}
])
.catch(_ => console.log('Some files already existed'))
```
You may notice that the files are not written in parallel.
It is possible to fix this by `Promise.all()`ing the `fs.writeFile()`s, but this is tricky since the `Promise` returned by `Promise.all()` rejects as soon as *any* of the `Promise`s reject.
If the the `Promise.all()` rejected, we would need to wait for each file to finish being written before trying to remove it.
// The node-fetch package provides a fetch() function for Node.js.
// Like in a browser, this makes a Promise for an HTTP request.
// You can install using with `npm install node-fetch`
const fetch = require('node-fetch')
const README_URL = 'https://unicode.org/Public/zipped/latest/ReadMe.txt'
const UNICODE_FILE = 'UnicodeData.txt'
// Gets the Unicode number ("code point") for the character with a given name
async function getUnicodeNumber(searchName) {
// Fetch the readme to determine the latest Unicode version (e.g. 13.0.0)
const readmeResponse = await fetch(README_URL)
const readme = await readmeResponse.text() // read readme as a string
// The last line of the readme contains the current Unicode URL,
// e.g. https://www.unicode.org/Public/13.0.0/ucd/
const [latestURL] = readme.trim().split('\n').slice(-1)
// Then fetch the list of Unicode characters
const charactersResponse = await fetch(latestURL + UNICODE_FILE)
const charactersData = await charactersResponse.text()
// Each line of the data file corresponds to one Unicode character
for (const characterData of charactersData.trim().split('\n')) {
// Each line has several fields separated by `;`.
// The first is the character's number and the second is its name.
const [characterNumber, characterName] = characterData.split(';')
if (characterName.toLowerCase() === searchName) {
// If this is the requested character, read its number in base-16
return parseInt(characterNumber, 16)
}
}
return undefined
}
// Prints the Unicode character with the given name (e.g. `pile of poo`)
async function printUnicode(searchName) {
// getUnicodeNumber() returns a Promise, so we need to await it
const number = await getUnicodeNumber(searchName)
// Convert the Unicode number to a character
if (number !== undefined) console.log(String.fromCodePoint(number))
}
// Prints 💩
printUnicode('pile of poo')
const fs = require('fs').promises
// Returns a Promise<number> that computes the most recent
// modification time for any file inside a directory
async function modifiedTime(path) {
// First, we wait for the fs.stat() Promise
const stats = await fs.stat(path)
const {mtimeMs} = stats
// If this path is a file, return its modification time
if (!stats.isDirectory()) return mtimeMs
// If this is a directory, check all files inside for later modified times
// Get a list of all files/subdirectories in the directory
const files = await fs.readdir(path)
// Then get the modified time of each file/subdirectory in parallel
const modifiedTimes = await Promise.all(files.map(file =>
modifiedTime(path + '/' + file)
))
// Return the latest modification time
return Math.max(mtimeMs, ...modifiedTimes)
}
// Even though we return a number from modifiedTime, the number
// is wrapped in a Promise because it is an async function
modifiedTime('.').then(mTime => {
console.log(`Most recent modification: ${new Date(mTime)}`)
})
const fs = require('fs').promises
// Returns a Promise<number> that computes the most recent
// modification time for any file inside a directory
async function modifiedTime(path) {
// First, we wait for the fs.stat() Promise
const stats = await fs.stat(path)
// Use the modification time of the file or directory itself
let latestTime = stats.mtimeMs
// If this is a directory, check all files inside for later modified times
if (stats.isDirectory()) {
// Wait to get a list of all files/subdirectories in the directory
const files = await fs.readdir(path)
// Then get the modified time of each file/subdirectory
for (const file of files) {
const fileModifiedTime = await modifiedTime(path + '/' + file)
// update latestTime if the modification time is later
latestTime = Math.max(latestTime, fileModifiedTime)
}
}
// Return the latest modification time
return latestTime
}
// Even though we return a number from modifiedTime, the number
// is wrapped in a Promise because it is an async function
modifiedTime('.').then(mTime => {
console.log(`Most recent modification: ${new Date(mTime)}`)
})
const fs = require('fs').promises
// Creates an array of files. If any file already exists,
// no files are created and an error is thrown.
async function writeFiles(files) {
for (let i = 0; i < files.length; i++) {
const file = files[i]
try {
// Write to the file, but throw an error if the file exists
await fs.writeFile(file.name, file.contents, {flag: 'wx'})
}
catch (e) {
// File already exists, so remove all the previously created files
await Promise.all(files.slice(0, i).map(file =>
fs.unlink(file.name)
))
// Re-throw the error
throw e
}
}
}
// Try to create 3 files
writeFiles([
{name: 'a.txt', contents: 'abc'},
{name: 'b.txt', contents: '123'},
{name: 'c.txt', contents: 'xyz'}
])
.catch(_ => console.log('Some files already existed'))
......@@ -244,7 +244,9 @@ const modifiedTime = path =>
)
// modifiedTimes will be an array containing the latest
// modification time of each file in the directory
.then(modifiedTimes => Math.max(mtimeMs, ...modifiedTimes))
.then(modifiedTimes =>
Promise.resolve(Math.max(mtimeMs, ...modifiedTimes))
)
}
// If this is a file, just return its modification time
else return Promise.resolve(mtimeMs)
......@@ -258,5 +260,5 @@ If you've ever used Haskell, you might have noticed that `Promise`s look a lot l
And `Promise.resolve()` is the equivalent of `return`, which wraps a normal value in an instance of `IO`.
Just like monads in Haskell allow you to write code that looks imperative in a functional language, `Promise`s let you write asynchronous programs that look like blocking programs, while still using callbacks under the hood.
Towards the end of the course, we'll also cover JavaScript's `async`-`await` notation, which hides even the calls to `.then()`.
Towards the end of the course, we'll also cover JavaScript's `async`-`await` notation, which hides even the calls to `.then()` and `Promise.resolve()`.
This is very similar to `do` notation in Haskell.
......@@ -18,7 +18,9 @@ const modifiedTime = path =>
)
// modifiedTimes will be an array containing the latest
// modification time of each file in the directory
.then(modifiedTimes => Math.max(mtimeMs, ...modifiedTimes))
.then(modifiedTimes =>
Promise.resolve(Math.max(mtimeMs, ...modifiedTimes))
)
}
// If this is a file, just return its modification time
else return Promise.resolve(mtimeMs)
......
# Recommended JavaScript reading
There is a lot of provided code this week in `diff.js`.
You should read the notes on [Node.js modules](../../notes/js/js.md#modules) for information on how to import these functions.
I highly encourage using `async`-`await` this week.
Like most real asynchronous projects, this one mostly consists of tasks that need to happen sequentially.
`async`-`await` can greatly simplify this sort of asynchronous code.
In `async` functions, `try`-`catch` statements are used to handle errors in `await`ed `Promise`s, so you may want to read about [error handling in JavaScript](../../notes/js/js.md#error-handling).
# Mini Version Control
## Goals
- Learn the core ideas underlying version control software
- Write an HTTP server that implements a JSON API
- Use `async`-`await` to reduce the amount of boilerplate code needed to write asynchronous programs
## Why version control?
You've been using Git for all the projects in this class, and possibly for other classes and personal projects too.
Git is one of many tools called "version control software", which track the history of changes to a set of files over time.
VCS tools differ in their implementations, but they share several core concepts.
In this project, you will implement a version control application with the core features of Git or CVS.
(Git actually stores its history quite differently than MiniVC since it's optimized for different use cases.
If you take CS 24 this fall, you will learn how Git really works by implement parts of it!)
The application has two parts which communicate over HTTP:
- A command line program allows the user to manage the repositories that have been downloaded locally and to communicate with the server
- A server stores the definitive copy of all repositories and mediates the changes uploaded by different clients
You are provided an implementation of the client, so you only need to write the server.
## Version control concepts
There are many ways to represent the history of files; for example, we could store a copy of each file whenever it is updated.
However, this would waste a lot of space since files are usually changed only slightly.
Instead, MiniVC tracks the *differences* between each version of a file and the previous version.
This is also convenient because it makes it easy to identify what changes were made to files between two versions.
### Diffs
A "diff" between two versions of a file is the set of changes needed to turn the first version into the second.
Again, there are different granularities of changes that you can track; each has tradeoffs.
Git considers a change to be inserting or deleting a line of text.
This works pretty well, so we'll use the same approach.
For example, consider the following two versions of a file (each line contains a single character for simplicity):
```
old | new
----+----
a | a
b | b
c | c
d | h
e | f
f | g
g |
```
"Diffing" these two versions (e.g. the command `diff -u old new`) produces the following diff:
```diff
--- old
+++ new
@@ -1,7 +1,6 @@
a
b
c
-d
-e
+h
f
g
```
You can see that the lines `d` and `e` were deleted and the line `h` was inserted.
We will represent a diff as an array of "same" and "change" elements.
For example, the diff above would become:
```js
[
// The first 3 lines stayed the same
{type: 'same', count: 3},
// Then 2 lines were deleted and replaced with the line `h`
{type: 'change', deletions: 2, insertions: ['h']},
// The last 2 lines stayed the same
{type: 'same', count: 2}
]
```
### Commits
A commit is the smallest unit of changes that MiniVC tracks.
They are the "versions" in "version control".
Each commit specifies the previous commit and the diffs for all files that changed since that commit.
Like Git, each commit also has a "commit message" that describes what was changed and a "commit ID" so that the next commit can refer to it.
For example, the following commit adds a comment to `program.c`:
```js
{
// The previous commit had ID 85c37d4521d4d51b6ee186856b1c14e6
parentID: '85c37d4521d4d51b6ee186856b1c14e6',
// Commit message
message: 'Added comment',
// Diffs for files that were changed
diffs: {
'program.c': [
{type: 'same', 'count': 2},
{type: 'change', deletions: 0, insertions: ['// TODO: fix this']},
{type: 'same', 'count': 3}
]
}
}
```
The first commit is special since it doesn't have a parent.
We will represent this by omitting the `parentID` field for the initial commit (or setting it to `undefined`).
### Merges
Version control histories are mostly linear, i.e. there is one chain of commits from the initial commit to the current commit.
In this case, the commits' `parentID`s form an implicit linked list of commits.
However, a good version control system allows multiple users to work on the codebase at the same time.
It is possible that two users are working off the same commit and both of them try to push new commits before they have seen the other's changes.
In this case, whichever commit is pushed first will be added to the history, but the second commit will need to be "merged" with the first commit.
In general, whenever a new commit is attempted, it must be merged with all of the commits that were missed.
For example, suppose I'm working off commit A but before I push commit B, other users push commits C, D, and E:
```
A - C - D - E
\
B ------^
```
Then the diff in commit B needs to be merged with the diff between A and E.
Once we figure out what the combined diff from A to B/E should be, we "subtract" the part that was already commited between A and E, and then add the rest of the diff as commit B.
Changing B to have E as its parent instead of A is called "rebasing" in Git.
### Repositories
A repository (what GitLab calls a "project") is a set of files under version control.
The history of the repository is represented by storing all its commits by ID, plus the commit ID of the current commit.
This pointer to the current commit is called `HEAD` in Git.
Note that `HEAD` won't point to a commit until the first commit is pushed to the repository.
You can choose how to store this information.
I recommend the way the client stores it: a `commits` directory with a file for each commit (where the filename is the commit ID), and a `head` file that stores the ID of the current commit.
Your server should save all files inside the current working directroy (`.`): for example, if the server is started in the directory `server-repos`, all files and directories created by the server should be inside `server-repos`.
## Server API
The command-line client communicates with the server via an HTTP API.
Requests and responses are sent as JSON.
You can parse the request JSON by concatenating all the chunks in the request stream into a string, and then calling `JSON.parse()` on it.
To respond with JSON, you should set the `Content-Type` header to `application/json`.
(JSON is not the most space-efficient way to store commit diffs, but it is very convenient to read and write from JavaScript.)
All responses have the form `{success: true, ...data}` if successful and `{success: false, message: string}` if an error occurs.
The server should listen on port 8000 and have the following `POST` endpoints:
### `/new_repository`
Makes a new empty respository with the given name.
It is an error if a repository with this name already exists.
The request JSON has the following form:
```ts
interface NewRepositoryRequest {
name: string
}
```
If successful, the response JSON has the following form:
```ts
interface NewRepositoryResponse {
success: true
}
```
### `/fetch`
Retrieves all new commits to the given repository.
The client indicates what its last known commit is (if any).
It is an error if the repository doesn't exist or the commit does not exist in that repository.
The request JSON has the following form:
```ts
interface FetchRequest {
// Repository name
name: string
// Last known commit ID.
// Omitted if the client doesn't have any commits.
parentID: string | undefined
}
```
If successful, the response JSON has the following form:
```ts
interface FetchResponse {
success: true
// Commits after parentID up to HEAD.
// They must be ordered from oldest to newest.
commits: Commit[]
}
interface Commit {
id: string
message: string
diffs: FileDiffs
}
interface FileDiffs {
// Map each filename to a diff for that file
[file: string]: Diff
}
// A diff is an array of "same" and "change" elements
type Diff = DiffElement[]
type DiffElement
= {type: 'same', count: number}
| {type: 'change', deletions: number, insertions: string[]}
```
### `/commit`
Pushes the given commit to the given repository.
It is an error if the repository does not exist or the parent commit does not exist in that repository.
You can use whatever ID for the new commit you want, but it should not be likely to conflict with an existing ID.
(I used [`require('crypto').randomBytes(16).toString('hex')`](https://nodejs.org/api/crypto.html#crypto_crypto_randombytes_size_callback).)
The commit needs to be merged/rebased off the current `HEAD` if `HEAD` is different from `parentID`.
If there is a merge conflict, `mergeFileDiffs()` will throw an error, which you should report to the client.
The request JSON has the following form:
```ts
interface CommitRequest {
// Repository name
name: string
// Parent commit.
// Omitted if this is the initial commit.
parentID: string | undefined
// Commit message
message: string
// Diffs for each changed file
diffs: FileDiffs
}
```
The response JSON has the same format as for `/fetch`.
The commits sent back should include the newly added commit.
## Diff functions
`diff.js` exports some utility functions for working with diffs.
The ones you will likely find useful when implementing the server are:
- `addFileDiffs()`: this function takes an array of diffs to apply in sequence and concatenates them into a single diff.
The result of each diff should be the source of the next.
For example, if we have a diff from `A` to `B`, `B` to `C`, and `C` to `D`, adding them gives the diff from `A` to `D`.
- `mergeFileDiffs()`: this function takes two diffs to apply in parallel.
Both diffs should have the same source.
For example, if we have a diff from `A` to `B` and from `A` to `C`, merging them gives a diff from `A` that combines both sets of changes.
- `subtractFileDiffs()`: this function takes two diffs and returns the diff that must be added to the second to get the first.
Both diffs should have the same source.
For example, if we have a diff from `A` to `C` and from `A` to `B`, subtracting them gives the diff from `B` to `C`.
Note that these functions expect diffs in the form of `FileDiffs` objects, which map filenames to diffs for individual files.
You don't need to understand the code behind these functions, but I think the diffing algorithm is really cool, so please ask me if you want to know more!
## Going further
If you are interested and want to add more to the VC project, try adding support for "branches".
Branches allow for there to be multiple "heads" with different names.
Ideally, it should be possible to:
- Pull the history of all branches from the server
- "Check out" a branch locally, replacing all files with their current versions on that branch
- Push a commit to the branch that is currently checked out
- Merge one branch into another (if there are no conflicts)
- Delete a branch (in the local repository and on the server)
Another issue we haven't addressed is handling concurrent accesses to repositories.
For example, the asynchronous server allows two commit requests to be processed at the same time, which could corrupt a repository.
A more robust server would ensure that any commit to a repository causes all other commits and fetches to that repository to wait until the commit finishes.
Implement this behavior by writing an asynchronous analog of mutexes or read-write locks using callback functions.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment