Simply Scheme Chapter 22 – Files

My work on https://people.eecs.berkeley.edu/~bh/ssch22/files.html

Some Brief Notes

Input/output procedures can take extra arguments to specify files, e.g.:

(so they’re all variable-argument procedures)

Scheme procedures that open a file return a port which you can use as an argument to an i/o procedure, e.g.:

I ran this and it created a songs file in the directory where I saved the scm file with the above code.

I’ve worked with files in other programming languages before (typically ruby), but not really with Scheme before!

You can read files into Scheme and deal with them there, e.g.:

If you use show-line you can make the lines in the file without parentheses:

If you want to read this file back into scheme one line at a time, you should use read-line, since just using read will do one word at a time:

They mention that closing files is important. I’ve run into the issue of forgetting to do that before.

read-string is another useful procedure:

The procedure read-string reads all of the characters on a line, returning a single word that contains all of them, spaces included:

Exercises

22.1 βœ…βŒ

Write a concatenate procedure that takes two arguments: a list of names of input files, and one name for an output file. The procedure should copy all of the input files, in order, into the output file.

(my initial solution was very inelegant and had a conceptual misunderstanding)

My Initial (Confused & Flawed) Solution ❌

I assume they mean it should copy the content of the input files into the output file (and not just the names). This was harder than I thought it would be.

I realized an error in the above after looking at other people’s answers. My program still worked because of how I set it up, but was unnecessarily complicated.

I think the issue was basically that I was still thinking in terms of returning values and not in terms of causing “side effects”. So I was passing (add-single-file-to-outp (car filenamelist) outp)) as the value for the argument outp in a recursive call to concatenate-helper. And I think that was working cuz I ultimately returned outp anyways from add-single-file-to-outp… but it’s just way more elegant to call outp from within concatenate-helper from the outset…and also to use begin to make the structure clear…

Anyways I thought this was interesting. It’s a somewhat subtle thing – like my brain was still wanting to do functional programming while I was trying to learn this new style and so it came out a bit of a mess.

My Fixed Solution βœ…

You can invoke it like this (the file names are examples):

(concatenate '("songs" "songs2") "combofile")

Other Solutions

Here is Andrew Buntine’s:

Note the good use of map to map the lambda function to each element of infiles. That saves him the trouble of needing another helper procedure!

Here is Meng Zhang’s:

His structure is very similar to mine.

22.2 βœ…

Write a procedure to count the number of lines in a file. It should take the filename as argument and return the number.

My Initial Solution

Using a let for the total value in line-counter let me have something I could easily return as the final value of the procedure after closing the input port.

Reading other people’s solutions, I realized that this was unnecessarily complicated. The total value being an argument to the helper procedure is unnecessary. We do want a name that refers to the total, but we can just define that as the value returned by the helper procedure. We want a name to refer to total because we want some name we can refer to that stands for the value of the total after we’ve closed the input port for the file. We want returning the total to be the last thing in our program, and that needs to come after we’ve closed the input file port, and we can’t close the input file port until we’ve calculated the value we want. So we need to calculate that value and put it somewhere for safekeeping, and then close the input port. “Calculate that value and put it somewhere for safekeeping” is what (let ((total (line-counter-helper file))) accomplishes below in my improved solution.

Within the helper procedure, we can just add a 1 for each invocation of the helper procedure and a 0 in the base case, and then add all that up. This is what Meng ZhangΒ did. Andrew Buntine followed much the same pattern.

My Improved Solution

After going through the above two problems, I simplified my initial answers for 22.3 and 22.4.

22.3 βœ…

Write a procedure to count the number of words in a file. It should take the filename as argument and return the number.

Very similar to previous problem. I put in something to handle if there are sentences and rely on readgoing through word-by-word otherwise:

This procedure is okay but can be done more simply via the method Andrew Buntine mentions. He uses length to count things up instead of my more complicated method. I was trying to be able to handle data like the following:

Note there are both lists (or Simply Scheme sentences) and individual words.
Andrew’s approach is able to handle this because he uses read-line to read each line, and read-line turns the lines with no parentheses around the words into lists with parentheses, thus permitting length to work correctly.

22.4 βœ…

Write a procedure to count the number of characters in a file, including space characters. It should take the filename as argument and return the number.

read-string includes the spaces.
The hidden new line characters aren’t counted, unlike in things like my text editor Atom, which do count them, but I think the intent here was just to count “normal” characters anyways.

22.5 βœ…

Write a procedure that copies an input file to an output file but eliminates multiple consecutive copies of the same line. That is, if the input file contains the lines

then the output file should contain

My Solution

The delete-if-exists stuff is just something I made to delete the output file if it already existed, in order to make testing faster. Hmm I should check whether Scheme has some kind of overwrite file thing.

Anyways I think this is pretty straightforward. remove-dupes-helper is obviously where the action is happening. that procedure takes a variable number of arguments. it’s initialized with the input file and output file. If it gets to the end of the input file, it just returns done. If it detects a match between the current line and the last line, it recursive invokes itself without writing anything, which skips over the dupe. Otherwise, it writes the current line to the output file, adds a newline, and then invokes itself with the current line as the optional last-line argument, which gives it the information it needs to check the current line against the previous one. I use se in the check for whether the lines are equal cuz that seemed to enable it to work, whereas it wasn’t working initially.

Meng Zhang’s solution

Meng Zhang’s solution for comparison:

This is structurally pretty similar to mine but it has an improvement. In my version, I used a variable number of arguments to handle the issue of having a place to put the data from the previous line. Meng’s solution is different and more elegant. When invoking his helper function, he calls read-line with the input file, which reads that line into his first-data parameter within his helper procedure. Then, within the helper procedure, he names data, which is the result of invoking read-line on the input file. At this point the first two lines of the input file have been read into first-data and data respectively. He then checks whether data is empty. If not, he checks whether data and first-data are equal. If so, then, like in my version, he recursively calls his helper procedure without doing any writing. He uses first-data as his second argument here in the recursive call, though since they are equal values in this case, he could have used first-data or data. If first-data and data are not in fact equal, then he writes first-data to the output file and recursively calls his helper procedure with data in the place of the first-data parameter.

I think first-data would better be called previous-line or something. And data reflects the current line. Anyways, summing up, you initialize the helper procedure with the first line. Then within the helper procedure, you read the next line. If you’ve reached the end of the file, you return false. If the two lines match, you write nothing and recursively invoke the helper procedure to read the next line. If they don’t match, you write the previous line (first-data) to the output file and then recursively call the helper procedure with the current line as the argument for the previous line. Pretty elegant.

22.6 βœ…

Write a lookup procedure that takes as arguments a filename and a word. The procedure should print (on the screen, not into another file) only those lines from the input file that include the chosen word.

My Solution

For input data like…

…the following works:

However, it won’t work for data formatted as lists like:

The reason it won’t work is related to how read-line works. read-line takes a line of input and returns a sentence:

But if you give it something that’s already a sentence, things get a bit weird:

I attempted to make a more robust version that could handle data formatted as lists but couldn’t figure out a good way to do it.

Other Solutions

Andrew Buntine’s solution was basically the same as mine except he (quite reasonably) used show line instead of my more manual combination of display and newline.

Meng Zhang’s solution had a more interesting difference. He used read instead of read-line to read the data from the input file. Because read will grab an entire line if it’s a list/sentence but only one word if the words are not surrounded by parentheses, his version works on files where the words are surrounded by parentheses, but won’t work if they’re not surrounded by parentheses. So it’s like the opposite of mine.

22.7 ❌

Write a page procedure that takes a filename as argument and prints the file a screenful at a time. Assume that a screen can fit 24 lines; your procedure should print 23 lines of the file and then a prompt message, and then wait for the user to enter a (probably empty) line. It should then print the most recent line from the file again (so that the user will see some overlap between screenfuls) and 22 more lines, and so on until the file ends.

My initial solution did not solve the problem due to failing to print one a line each cycle of lines.

My Initial Solution (Flawed) ❌

Note that I also made the program output the number of the current line.

I realized that this was actually skipping over some content. It was an off-by-one type of thing where I wasn’t print one of the lines for each cycle of line printing. After glancing at some other people’s solutions and thinking about it, I tried again and came up with some simpler and better-organized code.

Corrected Solution βœ…

I realized I didn’t actually need to keep track of the previous line and the current line – when I got to the end of a page, I just needed to print the current line twice – before and after the user provides input.

22.8 βœ…

A common operation in a database program is to join two databases, that is, to create a new database combining the information from the two given ones. There has to be some piece of information in common between the two databases. For example, suppose we have a class roster database in which each record includes a student’s name, student ID number, and computer account name, like this:

We also have a grade database in which each student’s grades are stored according to computer account name:

We want to create a combined database like this:

in which the information from the roster and grade databases has been combined for each account name.
Write a program join that takes five arguments: two input filenames, two numbers indicating the position of the item within each record that should overlap between the files, and an output filename. For our example, we’d say

In our example, both files are in alphabetical order of computer account name, the account name is a word, and the same account name never appears more than once in each file. In general, you may assume that these conditions hold for the item that the two files have in common. Your program should not assume that every item in one file also appears in the other. A line should be written in the output file only for the items that do appear in both files.

My solution:

join opens (and ultimately closes) an output port for the output file and an input port for the first file, and invokes join-helper.

lookup and its helper return the line within the second file, if any, that contains the same value that is present in the first file at pos1. If no such line exists, lookup returns Β ‘(no match found). line-joiner takes two lines and the positions and joins the lines together at the indicated positions.

join-helper reads a line from the first file. Then it gets the value returned by lookup for that line, and stores it in the name lookup-value. If that value is (no match found), my code for that situation kicks in. I didn’t think it was clearly specified what to do in this case so I decided to handle it in the following way: the current line from the first file is written to the output file, and an error message Β “ERROR: no matching merge data found in second file” is appended to the same line to indicate that no match was found. The output looks like this:

If, on the other hand, the lookup-value is anything else, then the value returned by invoking line-joiner is written to the output file, along with a newline.

I would have used item directly in the let statement for lookup-value, but without something like overlapword that explicitly returned something when the argument to it was an eof-object, i was getting errors.

Note that each call to lookup by each recursive call of join-helper opens up a separate instance of the second file that the procedure can search through. That’s part of what makes this solution work.