Persisting data in hash table to and from disk

Discussion of Common Lisp
Post Reply
lispamour
Posts: 18
Joined: Wed Jun 02, 2010 12:29 am

Persisting data in hash table to and from disk

Post by lispamour » Tue Jun 22, 2010 3:19 am

In Practical Common Lisp, p. 25, is a SAVE-DB function for a sample CD database which writes the contents of the database to disk, so that they can be read back in again.

Code: Select all

(defun save-db (filename)
  (with-open-file (out filename
                       :direction :output
                       :if-exists :supersede)
    (with-standard-io-syntax
        (print *db* out))))

(defun load-db (filename)
    (with-open-file (in filename)
      (with-standard-io-syntax
        (setf *db* (read in)))))
This works fine when *db* is implemented as a collection of p-lists, as described in PCL, but I've modified the code, and have created a database in which the records are not just p-lists, but can also be represented as structures and hash tables. The code above works not only for p-lists, but also if the database consisted of records implemented as structures; however it fails if the implementation is a hash.

That is, the line

Code: Select all

   (print *db* out)
will not work for a hash table, resulting in the error:

Code: Select all

Error: Trying to print #<EQUAL Hash Table{4} 200AFF1B> unreadably with *PRINT-READABLY* set.
  1 (continue) Print #<EQUAL Hash Table{4} 200AFF1B> anyway.
  2 (abort) Return to level 0.
  3 Return to top loop level 0.

Type :b for backtrace, :c <option number> to proceed,  or :? for other options

Is there a way to make this work in Common Lisp, short of manually iterating through the contents of the hash table and printing each key and value:

Code: Select all

   (maphash #'(lambda (k v) (print (list k v))) hash-table)
I was hoping to find a solution which would not require modifying the LOAD-DB function, so that

Code: Select all

   (setf *db* (read in))
would continue to work for all 3 types of data structures.

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Persisting data in hash table to and from disk

Post by ramarren » Tue Jun 22, 2010 4:57 am

lispamour wrote:Is there a way to make this work in Common Lisp, short of manually iterating through the contents of the hash table and printing each key and value:
What other solution would you expect? The standard specify no representation for hashtables, and internal implementation representation would be likely non-contiguous even if you could access it. You have to print it out pair by pair. It is possible to create a custom read macro to make READ work on such custom representation, but there isn't really much point usually.

Serializing to Lisp forms is used mostly for very small datasets, where it might just as well be an alist or plist. It is possible to use a serialization library like hu.dwim.serializer to store arbitrary objects to disk, or an object datastore like bknr, or any of ones back by a database like Elephant or rucksack.

Jasper
Posts: 209
Joined: Fri Oct 10, 2008 8:22 am
Location: Eindhoven, The Netherlands
Contact:

Re: Persisting data in hash table to and from disk

Post by Jasper » Tue Jun 22, 2010 9:28 am

Any idea what the better libraries for serialization are? Cliki seems a little light on it.I used cl-store before, that seems to work fine.

There'd seem to be two types of serialization one might want, one is storing (parts of) the state lisp is in, and the other to just store objects. In the latter one would expect issues with, for instance storing symbols; the package might not yet exist, also storing of instances might not have a class yet. If one wants to store those 'transparently' one would have to also store some lisp state. One might expect to be able to optionally assert that the lisp-state already allows for the object/system, cl-store doesn't seem to have this..

lispamour
Posts: 18
Joined: Wed Jun 02, 2010 12:29 am

Re: Persisting data in hash table to and from disk

Post by lispamour » Mon Jun 28, 2010 5:17 am

Sorry for my late reply. It's been a busy past week, and I've had several things to attend to before I could take up Lisp again.
Ramarren wrote:
lispamour wrote:Is there a way to make this work in Common Lisp, short of manually iterating through the contents of the hash table and printing each key and value:
What other solution would you expect? The standard specify no representation for hashtables, and internal implementation representation would be likely non-contiguous even if you could access it. You have to print it out pair by pair.
Originally, I was hoping that a hashtable would allow me to specify a print function in the same way that a structure does. Eg:

Code: Select all

(defstruct (mystruct
             (:print-function print-mystruct))
   foo
   bar)
That way, I could print out the keys and values pair by pair in PRINT-MYSTRUCT.

I had thought it a little strange that the PRINT function would work for structs and plists, but not for hashtables. But as you've indicated above, the PRINT function simply accesses a contiguous chunk of memory, which is not characteristic of hashtables. In Clojure, hashtables, lists, and sets are all first-class objects which comply with PRINT and READ so that they can be persisted to disk.
It is possible to create a custom read macro to make READ work on such custom representation, but there isn't really much point usually.
Can you show me how to write these PRINT and READ macros so that they call the usual PRINT and READ functions for structs and plists, but for hashtables, they call the custom hash-printing and hash-reading functions:

Code: Select all

(defun print-hash (hashtable output-stream)
    (with-standard-io-syntax
          (print (hash-table-count hashtable) output-stream)
          (maphash #'(lambda (k v)
                       (print  k output-stream)
                       (print  v output-stream))
                    hashtable)))

Code: Select all

(defun read-hash (input-stream)
    (let ((hnew (make-hash-table))
          (n       0))
         (with-standard-io-syntax
             (setf n (read input-stream))
             (dotimes (i n)
                (setf (gethash (read input-stream) hnew) (read input-stream))))
          hnew))
Ie, is it possible in the PRINT macro to call the usual PRINT function for plists and structs, but PRINT-HASH for hashtables without causing a name collision between the function PRINT and the macro PRINT?

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Persisting data in hash table to and from disk

Post by ramarren » Mon Jun 28, 2010 6:44 am

lispamour wrote:But as you've indicated above, the PRINT function simply accesses a contiguous chunk of memory, which is not characteristic of hashtables.
That is not what I meant. What I meant is, since the hashtable is likely non-contiguous there is no other way than iterate over keys and access values though those to print it, which you indicated as nonsatisfactory, but that was probably an over-interpretation.
lispamour wrote:I had thought it a little strange that the PRINT function would work for structs and plists, but not for hashtables.
PRINT function does work for hashtables, it just prints an unreadable representation. There is no technical reason for this. The standard leaves this decision to the implementations, and many choose to print an unreadable representation because the cost of converting to a plist/alist is trivial compared to printing cost.

Such code is trivial using standard Common Lisp printing protocol and #. read macro. Your implementation might complain about redefining primary printing method on system class, but SBCL doesn't (only gives normal redefinition warning).

First, utility functions from alexandria:

Code: Select all

(defun hash-table-alist (table)
  "Returns an association list containing the keys and values of hash table
TABLE."
  (let ((alist nil))
    (maphash (lambda (k v)
               (push (cons k v) alist))
             table)
    alist))

(defun alist-hash-table (alist &rest hash-table-initargs)
  "Returns a hash table containing the keys and values of the association list
ALIST. Hash table is initialized using the HASH-TABLE-INITARGS."
  (let ((table (apply #'make-hash-table hash-table-initargs)))
    (dolist (cons alist)
      (setf (gethash (car cons) table) (cdr cons)))
    table))
Then install the printer method which will create a readable representation:

Code: Select all

(defmethod print-object ((ht hash-table) stream)
           (format stream "#.(alist-hash-table '~s)" (hash-table-alist ht)))
Short test:

Code: Select all

CL-USER> (defparameter *ht* (make-hash-table))
*HT*
CL-USER> (setf (gethash 'a *ht*) 1)
1
CL-USER> (setf (gethash 'b *ht*) 2)
2
CL-USER> (setf (gethash '5 *ht*) 'c)
C
CL-USER> (print *ht*)

#.(alist-hash-table '((5 . C) (B . 2) (A . 1))) 
#.(alist-hash-table '((5 . C) (B . 2) (A . 1)))
CL-USER> (read-from-string "#.(alist-hash-table '((5 . C) (B . 2) (A . 1)))")
#.(alist-hash-table '((A . 1) (B . 2) (5 . C)))
47
CL-USER> (gethash 'b *)
2
T

lispamour
Posts: 18
Joined: Wed Jun 02, 2010 12:29 am

Re: Persisting data in hash table to and from disk

Post by lispamour » Mon Jul 05, 2010 4:21 pm

Ramarren wrote:
lispamour wrote:I had thought it a little strange that the PRINT function would work for structs and plists, but not for hashtables.
PRINT function does work for hashtables, it just prints an unreadable representation. There is no technical reason for this. The standard leaves this decision to the implementations, and many choose to print an unreadable representation because the cost of converting to a plist/alist is trivial compared to printing cost.
Then install the printer method which will create a readable representation:

Code: Select all

(defmethod print-object ((ht hash-table) stream)
           (format stream "#.(alist-hash-table '~s)" (hash-table-alist ht)))
Thank you! That's an excellent solution. That takes care of my original problem very nicely and very simply. I like it a great deal.

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Persisting data in hash table to and from disk

Post by ramarren » Mon Jul 05, 2010 10:27 pm

lispamour wrote:Thank you! That's an excellent solution. That takes care of my original problem very nicely and very simply. I like it a great deal.
Do note that this is more of a hack than proper solution, since I do not think that the standard require redefinability of methods specialized on system classes. This might not be a problem if you implementation does, but it limits portability between implementations. Also it depends on reader evaluation being allowed.

More proper way to do it would be to define your own datastructure and create a set of custom reader macros. There is an example of this in FSet library, which provides functional collections. File Code/reader.lisp shows how to implement custom reader macros.

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Persisting data in hash table to and from disk

Post by gugamilare » Tue Jul 06, 2010 7:17 am

Ramarren wrote:
lispamour wrote:Thank you! That's an excellent solution. That takes care of my original problem very nicely and very simply. I like it a great deal.
Do note that this is more of a hack than proper solution, since I do not think that the standard require redefinability of methods specialized on system classes. This might not be a problem if you implementation does, but it limits portability between implementations. Also it depends on reader evaluation being allowed.

More proper way to do it would be to define your own datastructure and create a set of custom reader macros. There is an example of this in FSet library, which provides functional collections. File Code/reader.lisp shows how to implement custom reader macros.
I would say that a more proper way to do it would be to use a serialization library, like cl-store. It is extensible and serialization with it would occupy much less space in your file then text representation. Or, if it is too big for you, hijack from cl-store the part that serialize and deserialize hash tables.

Post Reply