HTTP
HTTP ("Hyper-Text Transfer Protocol") is the protocol used universally on the Web. Since web applications are ubiquitous, making and responding to HTTP requests are a common use case for browser-side JavaScript and Node.js.
We will focus on the interface HTTP provides rather than its implementation. If you are interested in the protocol details, MDN is a good starting point.
At its core, HTTP is a request-response protocol: a "client" makes a request to a "server", which sends a response back. HTTP allows a "request" to be nearly anything. Typically a request either asks the server to send or generate a file (e.g. "download this video"), or asks the server to perform some action (e.g. "like this post"). You will be implementing an HTTPS client on the Wiki Game project, and an HTTP server for MiniVC.
GET
requests: how a web browser works
At its simplest, an HTTP request just contains a URL to download from a server. This is what a web browser does: take the URL you've entered, turn it into an HTTP request, and render the content in the server's response.
In Node.js, the http
module provides functions for writing HTTP clients and servers.
http.get()
takes a URL to GET
and a callback to call with the response.
For example, we can use a GET
request to download the CS 11 course page:
const http = require('http')
http.get('http://courses.cms.caltech.edu/cs11/', res => {
// `res` is a readable stream, so we can print it
// by piping it to the standard output
res.pipe(process.stdout)
})
This prints the HTML document that the server sends:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>CS 11 Home Page</title>
</head>
...
Browser-side JavaScript unfortunately uses different functions for HTTP requests.
You won't have to make browser-side HTTP requests in the projects, but the fetch()
section has more information if you're interested.
HTTP URLs actually have two main parts: in the example above, courses.cms.caltech.edu
specifies what server to connect to, and /cs11/
is the path that is requested from the server.
The code running on the server listens for requests and uses their URLs to figure out what to send back.
In Node.js, HTTP servers are created using http.createServer()
.
It takes a callback function to invoke for each request; the callback receives the corresponding request and response objects.
Here is a basic file server that converts the requested URL into a filename in a files
directory:
const server = http.createServer((req, res) => {
// `req` is the request object; `res` is the response
// Use the requested URL to serve a file in the `files` directory
const readStream = fs.createReadStream(path.join('files', req.url))
readStream.on('error', _ => {
// Tell the client if the file didn't exist
res.write('File not found')
res.end()
})
// `res` is a writable stream. Piping the file to it sends it to the client.
readStream.pipe(res)
})
// Listen for requests on port 80 (the default for HTTP)
server.listen(80)
If you run this server, you can request a file (for example, index.html
) by going to localhost/index.html
in your browser.
(Some OSes require sudo
permission to listen on port 80.
You can also listen on port 8000
instead and visit localhost:8000/index.html
in your browser.)
POST
requests
GET
requests can be used to request data from an HTTP server, but what about sending data to the server?
The URL itself can store small amounts of data, usually represented by a "querystring", e.g. in google.com/search?q=node.js&oq=node.js&aqs=chrome.0.69i59l3j0l2j69i61l3.746j0j9&sourceid=chrome&ie=UTF-8
.
However, since there are various limits on the length of URLs, this doesn't scale well.
HTTP provides several "request methods" besides GET
, the most common of which is POST
.
A POST
request includes additional data to send to the server.
In Node.js, you can use the more general http.request()
function to create a POST
request.
On the server, you can tell what type of request was performed from req.method
.
Here is a version of the file server that supports uploads using POST
requests:
http.createServer((req, res) => {
const filePath = path.join('files', req.url)
if (req.method === 'POST') {
// A POST request uploads a file.
// On a POST request, `req` is a readable stream.
req.pipe(fs.createWriteStream(filePath))
// Send an empty response when the pipe finishes
.on('finish', () => res.end())
}
else {
// A GET request retrieves a file
const readStream = fs.createReadStream(filePath)
readStream.on('error', _ => {
res.write('File not found')
res.end()
})
readStream.pipe(res)
}
}).listen(80)
A Node.js client can then upload files by piping them into a POST
request:
const filename = process.argv[2]
const uploadName = process.argv[3]
// Make a POST request to the upload URL.
// `request` is a writable stream.
const request = http.request(
`http://localhost/${uploadName}`,
{method: 'POST'}
)
// Send the file to upload by piping it into the request stream
fs.createReadStream(filename).pipe(request)
Now, running node upload-client.js path/to/index.html index.html
uploads index.html
to the server.
Opening localhost/index.html
in a browser will then show the uploaded file.
APIs
While HTTP was originally designed as an protocol for humans to interact with servers, it is often used today for communication between programs.
For example, JavaScript running in a browser can request data from a server (see fetch()
) and render it on the webpage.
This requires the client and server programs to agree on an interface ("API") for communicating over HTTP.
A common data format used in HTTP APIs is JSON ("JavaScript Object Notation"), which uses the JS literal syntax for objects ({}
), arrays ([]
), numbers, strings, and booleans to encode data as text.
JS provides the builtin functions JSON.stringify()
and JSON.parse()
for converting to and from JSON, respectively:
const value = {
number: 1,
string: 'abc',
array: [2, false],
object: {x: 1, y: 2}
}
const json = JSON.stringify(value)
// '{"number":1,"string":"abc","array":[2,false],"object":{"x":1,"y":2}}'
const value2 = JSON.parse(json)
/*
{
number: 1,
string: 'abc',
array: [2, false],
object: {x: 1, y: 2}
}
*/
As an example, we can implement a pseudorandom number generator API. It supports two URLs:
-
POST /seed
: this sets the initial state of the generator. The request should contain JSON of the form{"seed": number}
. The response is empty. -
GET /next
: this updates the state of the generator and returns the next random number. The response contains JSON of the form{"value": number}
.
Here is the server, which uses a linear congruential generator:
// The last value returned from the generator.
// The initial value is the "seed".
let lastValue = 1
http.createServer((req, res) => {
if (req.url === '/seed') {
// POSTing to `/seed` sets the seed of the generator
let body = ''
req.setEncoding('utf8')
.on('data', chunk => {
// Concatenate request body into a single string
body += chunk
})
.on('end', () => {
// Parse the JSON request body and extract the seed
lastValue = JSON.parse(body).seed
// Send an empty response
res.end()
})
}
else {
// GETing `/next` updates the generator
// and responds with the new value
lastValue = (lastValue * A) % M
res.write(JSON.stringify({value: lastValue}))
res.end()
}
}).listen(80)
And here is the client, which takes an argument that specifies how many numbers to generate:
const count = Number(process.argv[2])
// Seed the generator with the current time (in milliseconds)
const seedRequest =
http.request('http://localhost/seed', {method: 'POST'}, _ => {
// Then obtain `count` random numbers
getNumbers()
})
seedRequest.write(JSON.stringify({seed: Date.now()}))
seedRequest.end()
function getNumbers() {
for (let i = 0; i < count; i++) {
http.get('http://localhost/next', res => {
let body = ''
res.setEncoding('utf8')
.on('data', chunk => {
// Concatenate response body into a single string
body += chunk
})
.on('end', () => {
// Parse the JSON response and print the number
console.log(JSON.parse(body).value)
})
})
}
}
HTTPS
You have probably seen URLs starting with https://
, or with a green lock shown in your browser (or a warning sign, in the case of csman).
What is HTTPS and how does it differ from HTTP?
The short answer is that they are exactly the same protocol, except HTTPS is communicated over a secure connection. The protocol that secures HTTPS is called TLS and it provides two major benefits:
- Both the request and the response are encrypted, so only the client and server know their contents. For example, if you send your password to your bank over HTTPS, an attacker that intercepts the request can't see your password.
- The server must prove that it is who it claims to be. For example, this gives you confidence that you are actually sending your password to your bank, not to an attacker pretending to be your bank.
The cryptography that underlies TLS is really neat, especially the certificate chain mechanism for verifying servers' identities. However, it is far beyond the scope of this course!
For your purposes, HTTPS has exactly the same interface as HTTP, with added security guarantees.
In Node.js, HTTPS functions are provided by the https
module and are identical to those from the http
module.
For example, if we try to access Apple's website over HTTP, it will tell us to use HTTPS instead (see Status codes):
const http = require('http')
const https = require('https')
// Request with HTTP -> redirects to HTTPS
http.get('http://www.apple.com', res => {
// 301 (Moved Permanently)
console.log(res.statusCode)
// Location: https://www.apple.com/
console.log(res.headers.location)
})
// Request with HTTPS -> successful
https.get('https://www.apple.com', res => {
res.pipe(process.stdout)
})
You won't need to create an HTTPS server in this course, but it requires obtaining a "TLS certificate" to verify that you own the website hosting the server. If you want to try this at some point, I recommend Let's Encrypt as an easy way to obtain a free HTTPS certificate.
fetch()
Aside: None of the following sections are required reading, but I encourage you to read them if you want to know more about HTTP.
If you liked Promise
s, you'll be happy to know that the modern interface for making HTTP/HTTPS requests in browser-side JavaScript is built on them.
Browsers provide a function fetch()
that takes a URL and other request parameters and returns a Promise
representing the response.
If the server returns a status code besides 200 OK
, the Promise
will reject.
Otherwise, the response can be read as a string, or parsed from JSON.
As an example, we can interact with the pseudorandom generator API above:
// Make a POST request to the given URL,
// using the given JSON string as the body
fetch('http://localhost/seed', {
method: 'POST',
body: JSON.stringify({seed: Date.now()})
})
.then(res => {
const promises = []
// Get 100 random numbers
for (let i = 0; i < 100; i++) {
promises.push(
// Make a GET request to the given URL
fetch('http://localhost/next')
// Parse the response from JSON
.then(res => res.json())
)
}
// Wait for all 100 responses
return Promise.all(promises)
})
.then(numbers => {
// Extract the value from each JSON response
// and print it to the console
for (const number of numbers) console.log(number.value)
})
(If you want to try this code out, you may notice that browsers are very strict about what URLs can be requested, for security reasons. You will have to serve the HTML file from the same server that hosts the API.)
If you want to use fetch()
on this week's project, there is an npm package node-fetch
that you can install which provides the fetch()
function for Node.js.
Aside: Webservers in Node.js
It is tedious to implement all the intricacies of HTTP (e.g. parsing request URLs, setting the right Content-Type
header, and caching and compressing responses).
Most HTTP servers written in Node.js instead use some library built on top of the http
/https
modules.
I recommend the npm package express
if you are building a more complicated HTTP server than the ones in this course.
Aside: Status codes
If you've ever seen something like "404 Page Not Found", you're already familiar with HTTP status codes. Status codes are magic numbers used to signify the success or failure of an HTTP request. There is a long list on Wikipedia, but these are the main ones to know about:
-
200 OK
: this is the normal status code, signifying a successful request -
30x
(mainly301 Moved Permanently
): redirects a browser to a new URL. The new URL is sent in theLocation
header of the response (see Headers). -
4xx
(e.g.403 Forbidden
and404 Not Found
): indicates that the request is invalid (e.g. the user doesn't have access, or the requested page doesn't exist) -
5xx
(mainly500 Internal Server Error
): indicates a server failure (e.g. an error was thrown while handling the request)
By setting res.statusCode
, our file server can send more helpful responses.
(A Node.js client would read the analogous res.statusCode
property inside http.get()
/http.request()
.)
Here we redirect /
(i.e. http://localhost
) to /index.html
and return a 404 when the requested file doesn't exist:
http.createServer((req, res) => {
if (req.url === '/') {
// Redirect `/` to `/index.html`
res.statusCode = 301
res.setHeader('Location', '/index.html')
res.end() // don't send any response
return
}
// By default, `res.statusCode` is 200 OK
const readStream = fs.createReadStream(path.join('files', req.url))
readStream.on('error', _ => {
// Indicate the file didn't exist with a 404
res.statusCode = 404
res.write('File not found')
res.end()
})
readStream.pipe(res)
})
You can see the 404 response in your browser's Development Tools, e.g. in Chrome:
Aside: Headers
HTTP allows any data to be sent in a response or POST
request body.
So how does a browser know whether it has been sent an HTML file, a JS file, or a video?
HTTP requests and responses also include "headers", a list of field names and their corresponding values.
There are many standard header names recognized by browsers; for example, here are some headers in the response from google.com
:
Date: Mon, 30 Mar 2020 03:23:17 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: 1P_JAR=2020-03-30-03; expires=Wed, 29-Apr-2020 03:23:17 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=201=ap24VohKW7_KXQXVaGGgxgusYR1fqqFqi2YfovchUYmgz4O4aK-O_4zBIKE1xy98W_oIF_9RsVMEIFv7wIhr71w-VVwlTKNReWxmM9c-5XuPPiVYOxShgEwEFzy5O4_V2JzhB9ax3gPno38QYICdA6TBcaPjVMm77qdVeWpjkzk; expires=Tue, 29-Sep-2020 03:23:17 GMT; path=/; domain=.google.com; HttpOnly
Content-Type
is the header that tells the browser what type of file is being sent (in this case, text/html
indicates an HTML file).
You can also see that headers can have multiple values (e.g. Set-Cookie
in this response).
Since even GET
requests can include headers, they provide a convenient way to send auxiliary data to the server.
For example, we can use headers to add login logic to the file server:
// The map of usernames to their passwords.
// Don't ever store raw passwords in a real application!
const passwords = new Map()
.set('alice', '123456')
.set('bob', 'supersecretpassword')
http.createServer((req, res) => {
// Check the provided `user` and `password` headers
const password = passwords.get(req.headers.user)
if (password === undefined || password !== req.headers.password) {
res.statusCode = 403 // Forbidden
res.end()
return
}
// If the login is valid, send the file
const readStream = fs.createReadStream(path.join('files', req.url))
readStream.on('error', _ => {
res.write('File not found')
res.end()
})
readStream.pipe(res)
}).listen(80)
Requesting a file now requires providing correct values for the user
and password
headers.
Don't send login details over unencrypted HTTP; this should be done over HTTPS instead.
Here's an example:
http.get(
'http://localhost/index.html',
{headers: {user: 'bob', password: 'supersecretpassword'}},
res => {
// If allowed to access the page, print it out
if (res.statusCode === 200) res.pipe(process.stdout)
else console.error('Invalid login')
}
)