# Recommended JavaScript reading This project makes heavy use of Node.js's standard library modules. They are well documented at [nodejs.org](https://nodejs.org/api/), with pages for each module and sections on each function provided by the module. There are also lots of examples of how to use the standard library functions, which will probably come in handy. You will also be implementing several `Transform` streams as part of this project. `Transform` streams are represented as subclasses of the `streams.Transform` class, so you may find it useful to read about [classes](../../notes/js/js.md#classes) in JS. # `grep` ## Goals - See how Node.js's stream abstraction simplifies working with different sources of data - Implement some `Transform` streams ## The Unix `grep` utility `grep` is a powerful command-line program for searching through large chunks of text. In its simplest form, `grep` takes a string to search for and a filename and prints out all the lines in the file that contain the string. For example, suppose `independence.txt` contains the following lines: ``` When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. ``` Then `grep ss independence.txt` prints all the lines containing `ss`: ``` it becomes necessary for one people to dissolve the political bands and to assume among the powers of the earth, ``` It is also possible to search for more complex [regular expressions](https://en.wikipedia.org/wiki/Regular_expression#Basic_concepts) in the text, e.g. `grep 'the[a-z]' independence.txt` prints: ``` which have connected them with another, the Laws of Nature and of Nature's God entitle them, that they should declare the causes which impel them to the separation. ``` If no files are specified, then `grep` will instead search for the string in its standard input. This is often used to filter the outputs of other commands. For example, `some-command | grep ERROR` will print out only the lines of output from `some-command` that contain `ERROR`. ## Your task Implement `grep` (including 4 of its command-line flags) in Node.js. Your `grep` will be invoked using `npm run grep -- [-i] [-r] [-v] [-z] pattern [files ...]`. (The `--` just tells `npm` to pass the remaining arguments to `grep.js`.) The flags `-i`, `-r`, `-v`, and `-z` are described later. A simple implementation of `grep` would read the entire input, loop over its lines, and print the lines that match the pattern. However, this has two big disadvantages on large inputs: - Storing the contents of an entire file can take up a lot of memory - It may take a while to read the whole file, and we would like to print the matching lines as soon as they are found Since each line is searched independently, only one line needs to be stored at a time. Therefore, your implementation **must** search each line as soon as it is read from the input and discard it afterwards. ### Command-line arguments You have been provided with some code that parses the command-line arguments (which Node.js stores in `process.argv`). The values should be interpreted as follows: - `pattern` stores the pattern to use to search the lines of input. It should be treated as a regular expression, which you can construct using the builtin function `RegExp`. The method [`regexp.test(string)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test) can be used to check whether a string matches a regular expression. - `files` stores the filenames (or directory names, if `-r` is used) to search. If no filenames are given and `-r` is not used, the standard input (`process.stdin`) should be searched instead. - `recursive` stores whether the flag `-r` was used. If so, `files` are expected to be directories and are searched recursively. For example, `npm run grep -- pattern -r dir1 dir2` searches all files in `dir1` and `dir2` (and all their subdirectories) for `pattern`. If no directories are provided, the current directory (`.`) should be searched instead. You can use the provided function `readdirRecursive()` to search each directory. - `gunzip` stores whether the flag `-z` was used. If so, all files are treated as [gzip](https://en.wikipedia.org/wiki/Gzip)ped text files and are unzipped before being searched. For example, if `independence.txt` is compressed to `independence.txt.gz`, then `npm run grep -- -z ss independence.txt.gz` prints the same lines as `npm run grep -- ss independence.txt`. Note that `-z` and `-r` can be used together. - `ignoreCase` stores whether the flag `-i` was used. If so, the pattern is case-insensitive, e.g. `npm run grep -- -i error` would match a line containing `error`, `ERROR`, or `ErRoR`. You can use `RegExp(pattern, 'i')` to make a case-insensitive regular expression. - `invert` stores whether the flag `-v` was used. If so, the pattern specifies lines to *exclude*; only the lines that don't match the pattern are printed. Finally, if multiple files are being searched or `-r` is used, `grep` helpfully shows which file a line is from. For example, if `independence2.txt` is identical to `independence.txt`, running `grep ss independence.txt independence2.txt` prints: ``` independence.txt:it becomes necessary for one people independence.txt:to dissolve the political bands independence.txt:and to assume among the powers of the earth, independence2.txt:it becomes necessary for one people independence2.txt:to dissolve the political bands independence2.txt:and to assume among the powers of the earth, ``` Your implementation should add the filenames to the lines in this case as well. ### Streams Your `grep` implementation must use streams to process the input files. The pipeline I recommend is: 1. Get a `Readable` stream that produces the text to be searched 2. Pipe it into a `Transform` stream to split chunks of text into lines 3. Use a `Transform` stream to filter the lines using the search pattern 4. If necessary, use a `Transform` stream to add filenames to the matched lines 5. Pipe the matched lines to `process.stdout` (a `Writable` stream) in order to print them There are three types of `Readable` stream you will likely use: - [`process.stdin`](https://nodejs.org/api/process.html#process_process_stdin), the standard input - [`require('fs').createReadStream()`](https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options) to create a `Readable` stream for a file - [`require('zlib').createGunzip()`](https://nodejs.org/api/zlib.html#zlib_zlib_creategunzip_options), which makes a `Transform` stream that unzips a stream of gzipped data Node.js streams can emit different sorts of data. By default, most streams emit chunks of bytes. This makes sense for a compressed file stream, but all other streams in the pipeline should emit chunks of text instead. You can call `stream.setEncoding('utf8')` on the input stream as well as each `Transform` stream to make them output text. Your `Transform` streams should also call `super({decodeStrings: false})` in their constructors so they receive each chunk of data as a string.