Recommended JavaScript reading
This project makes heavy use of Node.js's standard library modules. They are well documented at nodejs.org, with pages for each module and sections on each function provided by the module. There are also lots of examples of how to use the standard library functions, which will probably come in handy.
You will also be implementing several Transform
streams as part of this project.
Transform
streams are represented as subclasses of the streams.Transform
class, so you may find it useful to read about classes in JS.
grep
Goals
- See how Node.js's stream abstraction simplifies working with different sources of data
- Implement some
Transform
streams
grep
utility
The Unix grep
is a powerful command-line program for searching through large chunks of text.
In its simplest form, grep
takes a string to search for and a filename and prints out all the lines in the file that contain the string.
For example, suppose independence.txt
contains the following lines:
When in the Course of human events,
it becomes necessary for one people
to dissolve the political bands
which have connected them with another,
and to assume among the powers of the earth,
the separate and equal station to which
the Laws of Nature and of Nature's God entitle them,
a decent respect to the opinions of mankind requires
that they should declare the causes
which impel them to the separation.
Then grep ss independence.txt
prints all the lines containing ss
:
it becomes necessary for one people
to dissolve the political bands
and to assume among the powers of the earth,
It is also possible to search for more complex regular expressions in the text, e.g. grep 'the[a-z]' independence.txt
prints:
which have connected them with another,
the Laws of Nature and of Nature's God entitle them,
that they should declare the causes
which impel them to the separation.
If no files are specified, then grep
will instead search for the string in its standard input.
This is often used to filter the outputs of other commands.
For example, some-command | grep ERROR
will print out only the lines of output from some-command
that contain ERROR
.
Your task
Implement grep
(including 4 of its command-line flags) in Node.js.
Your grep
will be invoked using npm run grep -- [-i] [-r] [-v] [-z] pattern [files ...]
.
(The --
just tells npm
to pass the remaining arguments to grep.js
.)
The flags -i
, -r
, -v
, and -z
are described later.
A simple implementation of grep
would read the entire input, loop over its lines, and print the lines that match the pattern.
However, this has two big disadvantages on large inputs:
- Storing the contents of an entire file can take up a lot of memory
- It may take a while to read the whole file, and we would like to print the matching lines as soon as they are found
Since each line is searched independently, only one line needs to be stored at a time. Therefore, your implementation must search each line as soon as it is read from the input and discard it afterwards.
Command-line arguments
You have been provided with some code that parses the command-line arguments (which Node.js stores in process.argv
).
The values should be interpreted as follows:
-
pattern
stores the pattern to use to search the lines of input. It should be treated as a regular expression, which you can construct using the builtin functionRegExp
. The methodregexp.test(string)
can be used to check whether a string matches a regular expression. -
files
stores the filenames (or directory names, if-r
is used) to search. If no filenames are given and-r
is not used, the standard input (process.stdin
) should be searched instead. -
recursive
stores whether the flag-r
was used. If so,files
are expected to be directories and are searched recursively. For example,npm run grep -- pattern -r dir1 dir2
searches all files indir1
anddir2
(and all their subdirectories) forpattern
. If no directories are provided, the current directory (.
) should be searched instead. You can use the provided functionreaddirRecursive()
to search each directory. -
gunzip
stores whether the flag-z
was used. If so, all files are treated as gzipped text files and are unzipped before being searched. For example, ifindependence.txt
is compressed toindependence.txt.gz
, thennpm run grep -- -z ss independence.txt.gz
prints the same lines asnpm run grep -- ss independence.txt
. Note that-z
and-r
can be used together. -
ignoreCase
stores whether the flag-i
was used. If so, the pattern is case-insensitive, e.g.npm run grep -- -i error
would match a line containingerror
,ERROR
, orErRoR
. You can useRegExp(pattern, 'i')
to make a case-insensitive regular expression. -
invert
stores whether the flag-v
was used. If so, the pattern specifies lines to exclude; only the lines that don't match the pattern are printed.
Finally, if multiple files are being searched or -r
is used, grep
helpfully shows which file a line is from.
For example, if independence2.txt
is identical to independence.txt
, running grep ss independence.txt independence2.txt
prints:
independence.txt:it becomes necessary for one people
independence.txt:to dissolve the political bands
independence.txt:and to assume among the powers of the earth,
independence2.txt:it becomes necessary for one people
independence2.txt:to dissolve the political bands
independence2.txt:and to assume among the powers of the earth,
Your implementation should add the filenames to the lines in this case as well.
Streams
Your grep
implementation must use streams to process the input files.
The pipeline I recommend is:
- Get a
Readable
stream that produces the text to be searched - Pipe it into a
Transform
stream to split chunks of text into lines - Use a
Transform
stream to filter the lines using the search pattern - If necessary, use a
Transform
stream to add filenames to the matched lines - Pipe the matched lines to
process.stdout
(aWritable
stream) in order to print them
There are three types of Readable
stream you will likely use:
-
process.stdin
, the standard input -
require('fs').createReadStream()
to create aReadable
stream for a file -
require('zlib').createGunzip()
, which makes aTransform
stream that unzips a stream of gzipped data
Node.js streams can emit different sorts of data.
By default, most streams emit chunks of bytes.
This makes sense for a compressed file stream, but all other streams in the pipeline should emit chunks of text instead.
You can call stream.setEncoding('utf8')
on the input stream as well as each Transform
stream to make them output text.
Your Transform
streams should also call super({decodeStrings: false})
in their constructors so they receive each chunk of data as a string.