# HTTP

HTTP ("Hyper-Text Transfer Protocol") is the protocol used universally on the Web.
Since web applications are ubiquitous, making and responding to HTTP requests are a common use case for browser-side JavaScript and Node.js.

We will focus on the interface HTTP provides rather than its implementation.
If you are interested in the protocol details, [MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview#HTTP_flow) is a good starting point.

At its core, HTTP is a request-response protocol: a "client" makes a request to a "server", which sends a response back.
HTTP allows a "request" to be nearly anything.
Typically a request either asks the server to send or generate a file (e.g. "download this video"), or asks the server to perform some action (e.g. "like this post").
You will be implementing an HTTPS client on the Wiki Game project, and an HTTP server for MiniVC.

## `GET` requests: how a web browser works

At its simplest, an HTTP request just contains a URL to download from a server.
This is what a web browser does: take the URL you've entered, turn it into an HTTP request, and render the content in the server's response.

In Node.js, the [`http`](https://nodejs.org/api/http.html) module provides functions for writing HTTP clients and servers.
`http.get()` takes a URL to `GET` and a callback to call with the response.
For [example](get-cs11.js), we can use a `GET` request to download the CS 11 course page:
```js
const http = require('http')

http.get('http://courses.cms.caltech.edu/cs11/', res => {
  // `res` is a readable stream, so we can print it
  // by piping it to the standard output
  res.pipe(process.stdout)
})
```
This prints the HTML document that the server sends:
```html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>CS 11 Home Page</title>
</head>

...
```

Browser-side JavaScript unfortunately uses different functions for HTTP requests.
You won't have to make browser-side HTTP requests in the projects, but the [`fetch()`](#aside-fetch) section has more information if you're interested.

HTTP URLs actually have two main parts: in the example above, `courses.cms.caltech.edu` specifies what server to connect to, and `/cs11/` is the path that is requested from the server.
The code running on the server listens for requests and uses their URLs to figure out what to send back.

In Node.js, HTTP servers are created using `http.createServer()`.
It takes a callback function to invoke for each request; the callback receives the corresponding request and response objects.
[Here](static-file-server.js) is a basic file server that converts the requested URL into a filename in a `files` directory:
```js
const server = http.createServer((req, res) => {
  // `req` is the request object; `res` is the response

  // Use the requested URL to serve a file in the `files` directory
  const readStream = fs.createReadStream(path.join('files', req.url))
  readStream.on('error', _ => {
    // Tell the client if the file didn't exist
    res.write('File not found')
    res.end()
  })
  // `res` is a writable stream. Piping the file to it sends it to the client.
  readStream.pipe(res)
})
// Listen for requests on port 80 (the default for HTTP)
server.listen(80)
```
If you run this server, you can request a file (for example, `index.html`) by going to [`localhost/index.html`](http://localhost/index.html) in your browser.
(Some OSes require `sudo` permission to listen on port 80.
You can also listen on port `8000` instead and visit [`localhost:8000/index.html`](http://localhost:8000/index.html) in your browser.)

## `POST` requests

`GET` requests can be used to request data from an HTTP server, but what about sending data to the server?
The URL itself can store small amounts of data, usually represented by a "querystring", e.g. in `google.com/search?q=node.js&oq=node.js&aqs=chrome.0.69i59l3j0l2j69i61l3.746j0j9&sourceid=chrome&ie=UTF-8`.
However, since there are various [limits on the length of URLs](https://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers), this doesn't scale well.
HTTP provides several ["request methods"](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods) besides `GET`, the most common of which is `POST`.

A `POST` request includes additional data to send to the server.
In Node.js, you can use the more general `http.request()` function to create a `POST` request.
On the server, you can tell what type of request was performed from `req.method`.
[Here](upload-server.js) is a version of the file server that supports uploads using `POST` requests:
```js
http.createServer((req, res) => {
  const filePath = path.join('files', req.url)
  if (req.method === 'POST') {
    // A POST request uploads a file.
    // On a POST request, `req` is a readable stream.
    req.pipe(fs.createWriteStream(filePath))
      // Send an empty response when the pipe finishes
      .on('finish', () => res.end())
  }
  else {
    // A GET request retrieves a file
    const readStream = fs.createReadStream(filePath)
    readStream.on('error', _ => {
      res.write('File not found')
      res.end()
    })
    readStream.pipe(res)
  }
}).listen(80)
```
A Node.js [client](upload-client.js) can then upload files by piping them into a `POST` request:
```js
const filename = process.argv[2]
const uploadName = process.argv[3]

// Make a POST request to the upload URL.
// `request` is a writable stream.
const request = http.request(
  `http://localhost/${uploadName}`,
  {method: 'POST'}
)
// Send the file to upload by piping it into the request stream
fs.createReadStream(filename).pipe(request)
```
Now, running `node upload-client.js path/to/index.html index.html` uploads `index.html` to the server.
Opening `localhost/index.html` in a browser will then show the uploaded file.

## APIs

While HTTP was originally designed as an protocol for humans to interact with servers, it is often used today for communication between *programs*.
For example, JavaScript running in a browser can request data from a server (see [`fetch()`](#aside-fetch)) and render it on the webpage.

This requires the client and server programs to agree on an interface ("API") for communicating over HTTP.
A common data format used in HTTP APIs is JSON ("JavaScript Object Notation"), which uses the JS literal syntax for objects (`{}`), arrays (`[]`), numbers, strings, and booleans to encode data as text.
JS provides the builtin functions `JSON.stringify()` and `JSON.parse()` for converting to and from JSON, respectively:
```js
const value = {
  number: 1,
  string: 'abc',
  array: [2, false],
  object: {x: 1, y: 2}
}
const json = JSON.stringify(value)
// '{"number":1,"string":"abc","array":[2,false],"object":{"x":1,"y":2}}'

const value2 = JSON.parse(json)
/*
{
  number: 1,
  string: 'abc',
  array: [2, false],
  object: {x: 1, y: 2}
}
*/
```

As an example, we can implement a pseudorandom number generator API.
It supports two URLs:
- `POST /seed`: this sets the initial state of the generator.
  The request should contain JSON of the form `{"seed": number}`.
  The response is empty.
- `GET /next`: this updates the state of the generator and returns the next random number.
  The response contains JSON of the form `{"value": number}`.

[Here](rand-server.js) is the server, which uses a [linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator):
```js
// The last value returned from the generator.
// The initial value is the "seed".
let lastValue = 1

http.createServer((req, res) => {
  if (req.url === '/seed') {
    // POSTing to `/seed` sets the seed of the generator
    let body = ''
    req.setEncoding('utf8')
      .on('data', chunk => {
        // Concatenate request body into a single string
        body += chunk
      })
      .on('end', () => {
        // Parse the JSON request body and extract the seed
        lastValue = JSON.parse(body).seed
        // Send an empty response
        res.end()
      })
  }
  else {
    // GETing `/next` updates the generator
    // and responds with the new value
    lastValue = (lastValue * A) % M
    res.write(JSON.stringify({value: lastValue}))
    res.end()
  }
}).listen(80)
```
And [here](rand-client.js) is the client, which takes an argument that specifies how many numbers to generate:
```js
const count = Number(process.argv[2])

// Seed the generator with the current time (in milliseconds)
const seedRequest =
  http.request('http://localhost/seed', {method: 'POST'}, _ => {
    // Then obtain `count` random numbers
    getNumbers()
  })
seedRequest.write(JSON.stringify({seed: Date.now()}))
seedRequest.end()

function getNumbers() {
  for (let i = 0; i < count; i++) {
    http.get('http://localhost/next', res => {
      let body = ''
      res.setEncoding('utf8')
        .on('data', chunk => {
          // Concatenate response body into a single string
          body += chunk
        })
        .on('end', () => {
          // Parse the JSON response and print the number
          console.log(JSON.parse(body).value)
        })
    })
  }
}
```

## HTTPS

You have probably seen URLs starting with `https://`, or with a green lock shown in your browser (or a warning sign, in the case of [csman](https://csman.cms.caltech.edu)).
What is HTTPS and how does it differ from HTTP?

The short answer is that they are exactly the same protocol, except HTTPS is communicated over a secure connection.
The protocol that secures HTTPS is called TLS and it provides two major benefits:
- Both the request and the response are encrypted, so only the client and server know their contents.
  For example, if you send your password to your bank over HTTPS, an attacker that intercepts the request can't see your password.
- The server must prove that it is who it claims to be.
  For example, this gives you confidence that you are actually sending your password to your bank, not to an attacker pretending to be your bank.

The cryptography that underlies TLS is really neat, especially the certificate chain mechanism for verifying servers' identities.
However, it is far beyond the scope of this course!

For your purposes, HTTPS has exactly the same interface as HTTP, with added security guarantees.
In Node.js, HTTPS functions are provided by the [`https`](https://nodejs.org/api/https.html) module and are identical to those from the `http` module.

For [example](http-https.js), if we try to access Apple's website over HTTP, it will tell us to use HTTPS instead (see [Status codes](#aside-status-codes)):
```js
const http = require('http')
const https = require('https')

// Request with HTTP -> redirects to HTTPS
http.get('http://www.apple.com', res => {
  // 301 (Moved Permanently)
  console.log(res.statusCode)
  // Location: https://www.apple.com/
  console.log(res.headers.location)
})

// Request with HTTPS -> successful
https.get('https://www.apple.com', res => {
  res.pipe(process.stdout)
})
```

You won't need to create an HTTPS server in this course, but it requires obtaining a "TLS certificate" to verify that you own the website hosting the server.
If you want to try this at some point, I recommend [Let's Encrypt](https://letsencrypt.org) as an easy way to obtain a free HTTPS certificate.

## *Aside*: `fetch()`

**None of the following sections are required reading, but I encourage you to read them if you want to know more about HTTP.**

If you liked `Promise`s, you'll be happy to know that the modern interface for making HTTP/HTTPS requests in browser-side JavaScript is built on them.
Browsers provide a function [`fetch()`](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) that takes a URL and other request parameters and returns a `Promise` representing the response.
If the server returns a status code besides `200 OK`, the `Promise` will reject.
Otherwise, the response can be read as a string, or parsed from JSON.

As an [example](rand-client.html), we can interact with the [pseudorandom generator API](rand-server.js) above:
```js
// Make a POST request to the given URL,
// using the given JSON string as the body
fetch('http://localhost/seed', {
  method: 'POST',
  body: JSON.stringify({seed: Date.now()})
})
  .then(res => {
    const promises = []
    // Get 100 random numbers
    for (let i = 0; i < 100; i++) {
      promises.push(
        // Make a GET request to the given URL
        fetch('http://localhost/next')
          // Parse the response from JSON
          .then(res => res.json())
      )
    }
    // Wait for all 100 responses
    return Promise.all(promises)
  })
  .then(numbers => {
    // Extract the value from each JSON response
    // and print it to the console
    for (const number of numbers) console.log(number.value)
  })
```
(If you want to try this code out, you may notice that browsers are very strict about what URLs can be requested, for security reasons.
You will have to serve the HTML file from the same server that hosts the API.)

If you want to use `fetch()` on this week's project, there is an npm package [`node-fetch`](https://www.npmjs.com/package/node-fetch) that you can install which provides the `fetch()` function for Node.js.

## *Aside*: Webservers in Node.js

It is tedious to implement all the intricacies of HTTP (e.g. parsing request URLs, setting the right `Content-Type` header, and caching and compressing responses).
Most HTTP servers written in Node.js instead use some library built on top of the `http`/`https` modules.
I recommend the npm package [`express`](https://www.npmjs.com/package/express) if you are building a more complicated HTTP server than the ones in this course.

## *Aside*: Status codes

If you've ever seen something like "404 Page Not Found", you're already familiar with HTTP status codes.
Status codes are magic numbers used to signify the success or failure of an HTTP request.
There is a *long* list on [Wikipedia](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes), but these are the main ones to know about:
- `200 OK`: this is the normal status code, signifying a successful request
- `30x` (mainly `301 Moved Permanently`): redirects a browser to a new URL.
  The new URL is sent in the `Location` header of the response (see [Headers](#aside-headers)).
- `4xx` (e.g. `403 Forbidden` and `404 Not Found`): indicates that the request is invalid (e.g. the user doesn't have access, or the requested page doesn't exist)
- `5xx` (mainly `500 Internal Server Error`): indicates a server failure (e.g. an error was thrown while handling the request)

By setting `res.statusCode`, our file server can send more helpful responses.
(A Node.js client would read the analogous `res.statusCode` property inside `http.get()`/`http.request()`.)
[Here](status-codes.js) we redirect `/` (i.e. `http://localhost`) to `/index.html` and return a 404 when the requested file doesn't exist:
```js
http.createServer((req, res) => {
  if (req.url === '/') {
    // Redirect `/` to `/index.html`
    res.statusCode = 301
    res.setHeader('Location', '/index.html')
    res.end() // don't send any response
    return
  }

  // By default, `res.statusCode` is 200 OK
  const readStream = fs.createReadStream(path.join('files', req.url))
  readStream.on('error', _ => {
    // Indicate the file didn't exist with a 404
    res.statusCode = 404
    res.write('File not found')
    res.end()
  })
  readStream.pipe(res)
})
```
You can see the 404 response in your browser's Development Tools, e.g. in Chrome:
![404 Not Found in Chrome](chrome-404.png)

## *Aside*: Headers

HTTP allows any data to be sent in a response or `POST` request body.
So how does a browser know whether it has been sent an HTML file, a JS file, or a video?
HTTP requests and responses also include "headers", a list of field names and their corresponding values.
There are many [standard header names](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Response_fields) recognized by browsers; for example, here are some headers in the response from `google.com`:
```
Date: Mon, 30 Mar 2020 03:23:17 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: 1P_JAR=2020-03-30-03; expires=Wed, 29-Apr-2020 03:23:17 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=201=ap24VohKW7_KXQXVaGGgxgusYR1fqqFqi2YfovchUYmgz4O4aK-O_4zBIKE1xy98W_oIF_9RsVMEIFv7wIhr71w-VVwlTKNReWxmM9c-5XuPPiVYOxShgEwEFzy5O4_V2JzhB9ax3gPno38QYICdA6TBcaPjVMm77qdVeWpjkzk; expires=Tue, 29-Sep-2020 03:23:17 GMT; path=/; domain=.google.com; HttpOnly
```
`Content-Type` is the header that tells the browser what type of file is being sent (in this case, `text/html` indicates an HTML file).
You can also see that headers can have multiple values (e.g. `Set-Cookie` in this response).

Since even `GET` requests can include headers, they provide a convenient way to send auxiliary data to the server.
For [example](file-server-auth.js), we can use headers to add login logic to the file server:
```js
// The map of usernames to their passwords.
// Don't ever store raw passwords in a real application!
const passwords = new Map()
  .set('alice', '123456')
  .set('bob', 'supersecretpassword')

http.createServer((req, res) => {
  // Check the provided `user` and `password` headers
  const password = passwords.get(req.headers.user)
  if (password === undefined || password !== req.headers.password) {
    res.statusCode = 403 // Forbidden
    res.end()
    return
  }

  // If the login is valid, send the file
  const readStream = fs.createReadStream(path.join('files', req.url))
  readStream.on('error', _ => {
    res.write('File not found')
    res.end()
  })
  readStream.pipe(res)
}).listen(80)
```
Requesting a file now requires providing correct values for the `user` and `password` headers.
**Don't send login details over unencrypted HTTP; this should be done over [HTTPS](#https) instead.**
Here's an [example](file-client-auth.js):
```js
http.get(
  'http://localhost/index.html',
  {headers: {user: 'bob', password: 'supersecretpassword'}},
  res => {
    // If allowed to access the page, print it out
    if (res.statusCode === 200) res.pipe(process.stdout)
    else console.error('Invalid login')
  }
)
```