make.md 9.52 KB

Recommended JavaScript reading

You will be writing a Node.js program, so I recommend reading about the basics of installing and using Node.js

This project also uses the builtin JS type Map heavily, so make sure you understand how to work with it.

make

Goals

  • See how Promises can be used to compose simple asynchronous tasks into complex asynchronous tasks
  • Learn how to run other programs (asynchronously) from a Node.js program

The GNU make utility

If you've ever worked with large C or C++ projects, you have probably used make or a similar build program. make solves a couple of problems with compiling large projects:

  • Decomposing the build. Rather than provide a full sequence of commands to build the project, make lets us specify how each file can be built from other files. Running make some-file automatically runs the necessary commands to create some-file, including building any files that some-file depends on.
  • Performing incremental builds. In a large project, it can be very expensive to rebuild the entire project when a few files change. make will only rebuild the files affected by the change, and continue to use existing versions of the other files.

How (synchronous) make works

We will use a C project as an example, since this is what make is often used for. Here's a quick refresher of how C code is compiled:

  • Code is written in .c files
  • .c files can declare some of their functions in a corresponding .h ("header") file. Other .c files can use this .h file to call the declared functions.
  • Each .c file is compiled individually into a .o ("object") file
  • Finally, the .o files are linked together into a single executable file

Consider a sample C project with two .c files:

  • util.c implements various useful functions declared in util.h
  • main.c implements some program that uses the functions declared in util.h

Since each .c file is compiled into a corresponding .o file, util.o depends on util.c and main.o depends on main.c. Note that both util.c and main.c include util.h, so both util.o and main.o depend on util.h. program, the resulting executable file, depends on util.o and main.o.

We can represent these relationships with a "dependency graph", where each file points to the files it depends on:

        program
         /   \
        /     \
       /       \
      /         \
    main.o    util.o
   /     \   /     \
  /       \ /       \
main.c   util.h   util.c

You can also read this from the bottom up: if util.h changes, main.o and util.o need to be rebuilt, and therefore program needs to be rebuilt too. However, if main.c is modified, only main.o and program need to be rebuilt; util.o is unaffected.

The dependency relationships restrict the orders in which we can build the files: we can build main.o and util.o in either order, but both need to be built before building program because program depends on them. Pseudo-code for synchronous make looks like this:

# Builds the given target. Returns when done.
def build(target):
    # Skip the build if target is up to date (or was already rebuilt)
    if needs_rebuild(target):
        # Recursively build target's dependencies one-by-one
        for dependency in get_dependencies(target):
            build(dependency)
        # Now that dependencies are up to date, we can build the target
        run_command(get_build_command(target))

Since build() waits to return until target is built, only one build command will run at a time! In our example above, main.o and util.o don't depend on each other, so the build could run faster if we built them simultaneously.

Your task

Implement make asynchronously in Node.js.

Rules

Your program will be run as npm run make makefile.js target1 ... targetN. npm run make runs the make.js program with node. makefile.js is a Node.js file that exports an array of all available rules to build files. Each rule has the following structure:

{
  // The name of the file that this rule builds
  target: string
  // The file's dependencies.
  // These can be existing files or ones built according to other rules.
  dependencies: Array<string>
  // The command to run to built the file.
  // This can be omitted (`undefined`) if no command needs to be run.
  command?: Array<string>
}

For example, the rules for the sample project above would look like:

[
  {
    target: 'program',
    dependencies: ['main.o', 'util.o'],
    command: ['clang', 'main.o', 'util.o', '-o', 'program']
  },
  {
    target: 'main.o',
    dependencies: ['main.c', 'util.h'],
    command: ['clang', '-c', 'main.c']
  },
  {
    target: 'util.o',
    dependencies: ['util.c', 'util.h'],
    command: ['clang', '-c', 'util.c']
  }
]

make.js should build all of the requested targets, including the files they depend on (and their dependencies, etc.). If there is no rule to build a target, then it is interpreted as a source file, so it should already exist.

Additional requirements:

  • Files should be built at most once. For example, if A depends on B and C, and both B and C depend on D, building A should only build D once.
  • Files should only be rebuilt if necessary. Specifically, a file needs to be built if it doesn't exist, or if any of its dependencies were changed since the target was last built. (You can use the "last modified time" that all files have to tell whether any dependencies are newer than the target.) Note that if an indirect dependency is modified, one of the target's dependencies will be rebuilt, so the target also needs to be rebuilt. For example, if A depends on B and B depends on C, then if C is updated, B needs to be rebuilt and then A does too.
  • Files should be built as soon as possible (i.e. as soon as all their dependencies are up to date).
  • Like make, print out each command before executing it. Also print out anything that the subprocess writes to stdout or stderr. This is already handled by runCommmand() in the starter code.
  • If any build command fails, everything that depends on it should fail and make.js should exit with an error code.

You must use Promises to represent the dependencies being built! During the build, there are three types of relationships between files which Promises represent very cleanly:

  • Sequencing (A must be built before B): this is captured by .then(), e.g.
    buildA.then(result => { /* return a Promise that builds B */ })
  • Multiple dependencies (A, B, and C must all be built before D): this is captured by Promise.all(), e.g.
    Promise.all([buildA, buildB, buildC])
      .then(([resultA, resultB, resultC]) => {
        /* return a Promise that builds D */
      })
  • Shared dependencies (A must be built before B and C): this is captured by calling .then() multiple times on the shared dependency, e.g.
    buildA.then(result => { /* return a Promise that builds B */ })
    buildA.then(result => { /* return a Promise that builds C */ })

By combining these Promise operations, we can implicitly represent the dependency graph; we don't need any additional data structure to store the graph! Note that if a command fails, its Promise rejects, which makes all the Promises that depend on it immediately reject too; this is exactly the behavior we want.

Options if you want to go further

  • Dependency graphs are technically "directed acyclic graphs". It is impossible to perform a build if there is a "dependency cycle", e.g. A depends on B, B depends on C, and C depends on A. If the build encounters a dependency cycle, print out a useful error message so the user can fix the set of rules.
  • Our implementation builds each file as soon as possible. If we are building lots of files that don't depend on each other, this spawns many simultaneous processes. A system with n CPUs can run up to n processes at the same time, but if we run more, the system wastes a lot of time switching back and forth between them. Limit the number of commands that run simultaneously to the number of CPUs. You can use os.cpus() to find out how many CPUs the computer has.

Useful functions from the Node.js standard library

To get file metadata (e.g. last modified time), you can use fsPromises.stat(filename), where fsPromises = require('fs').promises. The modified time is stored in the mtimeMs field of the returned fs.Stats object.

Child processes can be created using child_process.execFile(). We use util.promisify() to make execFile() return a Promise instead of using a callback function. This, along with printing out the command and its output, is already done by runCommand() in the starter code.

You can use process.exit(statusCode) to exit the process with the given code. If the code is non-zero, it indicates that make.js failed.