Page 1 of 1

sxhash and *package* interaction

PostPosted: Mon Jul 21, 2008 8:44 am
by pTymN
Shouldn't I see the same hash value for both sxhash calls?

(defpackage "foo")
(in-package "foo")
(sxhash 'bar)

(defpackage "foo2")
(in-package "foo2")
(sxhash 'bar)

Re: sxhash and *package* interaction

PostPosted: Mon Jul 21, 2008 9:00 am
by findinglisp
No, because foo:bar and foo2:bar are not the same symbol. That is, (eql foo:bar foo2:bar) => nil. While an implementation could make the hashes the same, they don't have to be, and in fact there are good reasons that they should not be. Ideally, I'd like every distinct symbol in the system to hash to a different value, irrespective of the symbol name, so that I know I can insert all symbols in the image into a hashtable with minimal collisions.

Re: sxhash and *package* interaction

PostPosted: Mon Jul 21, 2008 9:09 am
by pTymN
What about test sxhash.23 in this file? http://svn.clozure.com/publicsvn/openmc ... sxhash.lsp and this note at the end of the clhs entry on sxhash "Although similarity is defined for symbols in terms of both the symbol's name and the packages in which the symbol is accessible, item 3 disallows using package information to compute the hash code, since changes to the package status of a symbol are not visible to equal."

Re: sxhash and *package* interaction

PostPosted: Mon Jul 21, 2008 9:41 am
by theclapp
pTymN wrote:What about [...] this note at the end of the clhs entry on sxhash "Although similarity is defined for symbols in terms of both the symbol's name and the packages in which the symbol is accessible, item 3 disallows using package information to compute the hash code, since changes to the package status of a symbol are not visible to equal."

That means that if you change package FOO, then the sxhash of FOO::BAR must remain the same after the change. It doesn't apply to two different symbols that are already not EQL.

Update:
Code: Select all
;;; The hash of a symbol does not change when its package changes
(deftest sxhash.23
  (progn
    (safely-delete-package "A")
    (defpackage "A" (:use))
    (let* ((pkg (find-package "A"))
           (sym (intern "FOO" pkg))
           (hash (sxhash sym)))
      (unintern sym pkg)
      (let ((hash2 (sxhash sym)))
        (if (eql hash hash2) nil (list hash hash2)))))
  nil)

Again, this tests hashes of the same symbol, not different symbols.

Re: sxhash and *package* interaction

PostPosted: Mon Jul 21, 2008 12:19 pm
by findinglisp
theclapp wrote:That means that if you change package FOO, then the sxhash of FOO::BAR must remain the same after the change. It doesn't apply to two different symbols that are already not EQL.


Right. So for instance one way to implement sxhash (not necessarily a good way or the way that anybody does it) would be to hash the symbol name along with its address in memory when it was created and then store this hash somewhere if the particular GC being used is allowed to move objects (e.g. not a simple mark-and-sweep).

The key point here is that foo:bar and foo2:bar are not the same object. And they won't be the same object even if you change their package status. Once created, symbols are unique and exist independent of their interning packages. Given that they are unique, they should preferably have unique hash codes (again, that's not required, but it's expected that the hash function will not map everything to the same hash code).

Re: sxhash and *package* interaction

PostPosted: Tue Jul 22, 2008 6:58 am
by pTymN
Turns out, Corman CL has a bug in sxhash where if you call (gc 2), you'll get different return values from sxhash. His hash table implementations had a hook that would trigger a rehash after a GC.

Re: sxhash and *package* interaction

PostPosted: Tue Jul 22, 2008 7:19 am
by findinglisp
pTymN wrote:Turns out, Corman CL has a bug in sxhash where if you call (gc 2), you'll get different return values from sxhash. His hash table implementations had a hook that would trigger a rehash after a GC.


That's bad. There is nothing that prevents a programmer from using sxhash values for his own devices, not just in standard hashtables. If sxhash values can change in violation of the standard, you could get some bad bugs popping out. It was probably using the object's address as its hash value for some object types. Looking at the spec page for SXHASH, we see these requirements...

[

Looks like #2 and #3 are the operative requirements here. And it also looks like my idea of using the address as part of the calculation for symbols wouldn't work either, even in a system that stored the hash value with the symbol for later or had a non-moving GC. That is, #2 is a strong requirement that SXHASH values for symbols have to be the same even across Lisp images. If that's the case, it actually does suggest that FOO:BAR and FOO2:BAR might end up with the same SXHASH value unless you're also allowed to use the package as part of the hash calculation. I'm guessing that most implementations do.