Editing a Stream Question

Discussion of Common Lisp
Post Reply
sabra.crolleton
Posts: 16
Joined: Sat Sep 13, 2008 6:46 pm

Editing a Stream Question

Post by sabra.crolleton » Sun Oct 24, 2010 5:31 pm

This should be obvious, but apparently I'm missing it. The idea is to read in a file, do a string replace and write it back out again.

My initial code is appending the edited text to the existing text rather than overwriting the old text. Any pointers?

Code: Select all

(defun read-file-replace-string1 (file-name string-pattern replacement-string)
	"Take a fully qualified filename, read it, replacing the string-pattern with the replacement string."
	(with-open-file (stream file-name :direction :io :if-exists :overwrite)
		(loop for line = (read-line stream nil)
			 while line do 
				 (format stream "~a~%" 
								 (cl-ppcre:regex-replace string-pattern line replacement-string)))))

nuntius
Posts: 538
Joined: Sat Aug 09, 2008 10:44 am
Location: Newton, MA

Re: Editing a Stream Question

Post by nuntius » Sun Oct 24, 2010 7:20 pm

I think you should be happy this code is appending to the file. Consider a file with the string "asdfasdf"; if your substitution is "s/a/abcd/g" or "s/a/aa/g", and the file is being processed one character at a time, then what would happen? What happens to data being overwritten before it is read?

Quoth the CLHS page for OPEN.
:if-exists... :overwrite

Output operations on the stream destructively modify the existing file. If direction is :io the file is opened in a bidirectional mode that allows both reading and writing. The file pointer is initially positioned at the beginning of the file; however, the file is not truncated back to length zero when it is opened.
This sounds like the POSIX concept of "open(path, O_RDWR|O_APPEND)".

If you do want to truncate the file before writing, I think you will need to open the file twice -- once to buffer the results, and once to write them out. Another common technique is to write the output to a temporary file and move it into place when finished.

JamesF
Posts: 98
Joined: Thu Jul 10, 2008 7:14 pm

Re: Editing a Stream Question

Post by JamesF » Mon Oct 25, 2010 5:23 pm

You might want to check the Hyperspec's entry on 'open - I think supersede will do what you want, better than overwrite - you'll have to assume the prefixed colons on those keywords, because BBCode thinks overwrite starts with a horrified expression.

Also, Nuntius has an excellent idea, though I'll suggest a slight variation: how about moving the original file to /tmp, then opening its original location for writing, opening the original from /tmp for reading, and stream from one to the other?

Code: Select all

(defun process-file (filepath tempfile string-pattern replacement-string)
  (rename-file filepath tempfile)
  (with-open-file (outfile filepath :direction :output :if-exists :supersede)
    (with-open-file (infile tempfile :direction :input)
      (loop for line = (read-line infile nil)
      while line do
     (format outfile "~a~%" (cl-ppcre:regex-replace string-pattern line replacement-string))))))
The redundant :supersede is an old habit of taking the blunt-instrument approach, and I've left out a bunch of error-handling code for clarity, lack of time and sheer laziness :)

My approach leaves the original file lying around afterwards which, on one hand is messy, but on the other, means you still have the original if something goes awry during the edit. I'm a sysadmin by profession; it's my job to look surprised when things don't spontaneously explode into a thousand flaming shards, so I like having a backout plan.
Nuntius' approach, by contrast, is great once you've tested the code and are sure you can rely on it (or re-generate the input) because you don't then have to remember to delete the original file. Much more elegant.

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Editing a Stream Question

Post by ramarren » Mon Oct 25, 2010 10:38 pm

JamesF wrote:Also, Nuntius has an excellent idea, though I'll suggest a slight variation: how about moving the original file to /tmp, then opening its original location for writing, opening the original from /tmp for reading, and stream from one to the other?
Because if you have a power failure or generally something triggering system restart inside that operation, /tmp filesystem might get wiped after the original is moved but before new version was completely written leading to data loss.

Usually you want the temporary (partially valid) data in a temporary file in the same file system, and then one the transformation is done use the rename call, which POSIX requires to be atomic for that case.

JamesF
Posts: 98
Joined: Thu Jul 10, 2008 7:14 pm

Re: Editing a Stream Question

Post by JamesF » Mon Oct 25, 2010 11:25 pm

Ramarren wrote:
JamesF wrote:how about moving the original file to /tmp..?
Because if you have a power failure or generally something triggering system restart inside that operation, /tmp filesystem might get wiped after the original is moved but before new version was completely written leading to data loss.
I did say I hadn't fully worked through all the ramifications :P
Yes, you're quite correct about the details of where the copy should be moved, but I still think the idea is sound.

nuntius
Posts: 538
Joined: Sat Aug 09, 2008 10:44 am
Location: Newton, MA

Re: Editing a Stream Question

Post by nuntius » Tue Oct 26, 2010 7:19 am

If you're moving the source file first, then I'd recommend using a normal convention for backup files like

Code: Select all

# mv file file~
# sed $pattern file~ > file

Post Reply