Compile-time information across macros

Discussion of Common Lisp
nuntius
Posts: 538
Joined: Sat Aug 09, 2008 10:44 am
Location: Newton, MA

Re: Compile-time information across macros

Post by nuntius » Sat Aug 22, 2009 11:50 pm

This project seems related to the topic at hand.
http://tehran.lain.pl/git.web/function-template

The timing is suspicious.

slowcoder
Posts: 4
Joined: Sat Aug 15, 2009 4:20 pm

Re: Compile-time information across macros

Post by slowcoder » Mon Aug 24, 2009 6:36 pm

Sorry again for the delay, unquestionably I am not getting mail. I will begin to poll the forum for now on.

How cool, I got pretty enlightening answers! Thanks!

@gugamilare: It seems your code for is-function-or-lambda-p is broken? "fun" is undefined, "membar" is a mispell (I think) and "arg" is unused. The idea in your code is interesting, but read on, I will show an example I am able to carry out in C++ but I have had no success in Lisp yet.

@Ramarren: Function call overhead is only low if either the body of the function takes long to execute or the function itself is called a few times. Otherwise (specially when it gets called a lot in an inner loop to carry on a single addition), it is unacceptable. About macro idempotence, as you said, consistent caching will make these macros look like they are idempotent.

@nuntius: I did some experiments with code like yours. The way I can compile and run code with Lisp is pretty cool! But there is something I need to know. Suppose I place such a code in its own package. When I load its file, I will be able to keep calling the function and successive instatiations will be cached because the fboundp function will be guarding the code emission. The question is: is the information associated with the mapping between symbols and functions always present at compile-time? That mapping is the compile-time data structure I have been talking about. It obviously need to be present in the interactive environment, but is there a way for me to remove it in non-interactive mode? It seems to me that your code is just like what I want, but I am unsure whether I have removed every trace of compile-time computation from runtime.

Ok, so now I am writing an example of typical usage of templates in C++. Since using canonical algebraic structures would be too verbose, let me make a more straight example. Suppose we want to code an intrusive red-black tree data structure. There will be a (compile-time) structure representing the characteristics of the node of the tree, a compile-time structure representing the the characteristics of the element and the comparison operation and a final structure representing the characteristics of the tree itself.

The node structure would be something like.

Code: Select all

struct example_node {
  typedef int element_t; /* The element type */
  element_t& getelem() {...} /* getter */
  const element_t& getelem() const {...}
  example_node* left(const example_node& nd) const {...} /* How to get to the left node */
  example_node* right(const example_node& nd) const {...} /* How to get to the right node */
  example_node* parent(const example_node& nd) const {...} /* How to get to the parent node */
  enum {has_size = true}; /* Tells whether the node can hold a size value */
  size_t size; /* Will hold the number of elements in the subtree determined by the node */
  size_t getsize(const example_node& nd) const {...}
};
The comparison structure would look like:

Code: Select all

struct int_less_than {
  typedef int element_t; /* The type of the elements */
  bool less(const element_t& x, const element_t& y) const {return x < y;} /* The ordering */
};
Finally, we would have the tree structure, something like

Code: Select all

struct my_rbtree {
  typedef example_node node_t; /* The node to manipulate */
  typedef int_less_than order_t; /* The ordering */
  enum {left = true}; /* Whether to keep track of the leftmost node for getting the min element in O(1) */
  enum {right = false}; /* Same for right node */
};
There is a reasong not to bundle the tree structure with the previous two, and that will be explained now.

The code will have some procedures, one of which will be insert, which will insert an element in the tree. insert will in turn call a procedure named insert_fixup, which will mantain the red-black properties, and that one will call both left_rotate and right_rotate, which will do single rotations.

The big point is that insert strongly depends on the ordering of the elements but insert_fixup and the rotations do not. Thus, separting the structures (which will be fed as template arguments to the procedure) allows us to reuse the compiled code for insert_fixup for a tree that sorts ints in increasing and decreasing order, reducing code bloat.

Another ability of this approach is that it can produce code on the fly. In principle, no important code is generate if I just initialize a tree. But the first time someone in the whole project (tons of source files) invoke insert, it will generate insert_fixup and the rotations. If someone else then call remove, it will generate remove_fixup but not the rotations. Actually g++ will, but the duplicates will be removed in link-time.

The code generated is specialized for that particular kind of node, ordering and tree, and parts of it will be shared when appropriate. In more elaborate structures, the programmer might feel inclined to mantain compile time data structures to run algorithms to decide which code to generate, but no trace of that is left at runtime. The binary is small and clean.

So far, the code generated is the fastest possible from the most general human code. If there are optimizations that were supressed due to the specification being too general, compile-time computation (aka template specialization) can be used to sort that out.

But there are drawbacks. Firstly, C++'s syntax is pretty awkward, but that is tolerable. Sencondly, and most importantly, sometimes the tree will be used so infrequently in the code (in terms of runtime call count) that it might pay to join some incompatible implementations into one for reducing code size at the cost of runtime performance. One way to do this would be passing a function pointer to decide the ordering, just like C's qsort.

The thing that made me rethink C++ as the way for generic programming and made me look for Lisp is simple: the syntax for doing such conversion is totally different from the syntax used to originally code the structure. The difference between the syntax of qsort and STL's sort is noticeable.

One more example. Suppose we want to compute the gcd of two elements of some euclidean domain. One way would be C++:

Code: Select all

template<typename E> typename E::t gcd(const typename E::t& a, const typename E::t& b) {...}
where E must provide the module (%) and the test for null element (== 0) operations.

To back off the information to compile time, one would have to go for Cish

Code: Select all

void gcd(void* result, const void* a, const void* b, bool (*is_zero)(const void*), void (*mod)(void*, const void*, const void*)) {...}
In Lisp, we can only hope for doing something like this:

Code: Select all

; sorry for possible n00b mistakes
(defstruct euclidean-domain zero? mod)
And we could code a normal gcd:

Code: Select all

(defun runtime-gcd (E a b) ...)
; uses (euclidean-domain-zero? E) (euclidean-domain-mod E)
But also, it would be great to put a general:

Code: Select all

(define-euclidean-domain ed-of-integers zerop mod)
which would both make a macro named ed-of-integers and a structure (make-euclidean-domain :zero? zerop :mod mod). Or maybe just one of them based on additional arguments.

Anyway, we could invoke (gcd E a b), which would expand into (runtime-gcd E a b) or (gcd (E) a b) (I am making up the syntax) which would expand into (let ((mangled-name ...)) (if (fboundp) ....).

That is what I am trying to achieve.

Sorry for the length of the post, but it is just something I am looking for for a long time and haven't found it. I need some way to:
#0. Express an algorithm generically over algebraic structures.
#1. Define the structures for runtime or compile-time (or both) with uniform syntax.
#2. Instatiate the algorithms for compile-time/runtime based on my decision with a similar syntax.
#3. Know that the system will instantiate specialized code on demand.
#4. Know that code will be shared as needed.

The approach of gugamilare in his mapcar routine satisfies only #0 and #3. It replicates the implementation all over each call. That is fine for mapcar, but is hazardous for a red-black tree.

So, ok, that is it. I hope I have expressed myself well.

Thanks for helping.

findinglisp
Posts: 447
Joined: Sat Jun 28, 2008 7:49 am
Location: Austin, TX
Contact:

Re: Compile-time information across macros

Post by findinglisp » Mon Aug 24, 2009 6:59 pm

slowcoder wrote:Sorry again for the delay, unquestionably I am not getting mail. I will begin to poll the forum for now on.
What email do you think you should be getting? The forum only will send you email when you subscribe to a specific forum section or to a specific topic thread. You can find a link to do each of these at the bottom of the appropriate pages. You can then manage your subscriptions in the User Control Panel. If you have done all that and you think the system should be sending you email but isn't, let me know via private message and we'll try to work it out.
Cheers, Dave
Slowly but surely the world is finding Lisp. http://www.findinglisp.com/blog/

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Compile-time information across macros

Post by ramarren » Tue Aug 25, 2009 12:18 am

slowcoder wrote:It obviously need to be present in the interactive environment, but is there a way for me to remove it in non-interactive mode? It seems to me that your code is just like what I want, but I am unsure whether I have removed every trace of compile-time computation from runtime.
Possibly due to the heritage of Lisp Machines, there isn't really a non-interactive mode in Common Lisp. The best you can do is save a tree-shaken image with a custom top-level function. And as far as I know no non-commercial implementation has a tree shaker, because the demand for such is insufficient compared to the complexity of the task. So there is no much point worrying about "every trace" of compile-time computation, since you can't remove a multimegabyte compiler anyway.

Also because of the image/incremental nature of Common Lisp linking happens during loading, and is much harder to separate from runtime than in statically compiled languages, so some sort of linkage table is unavoidable. I suppose it could be removed manually after loading is finished. But why? RAM is cheap.

In general note that the reason why the answers to your questions may appear confused is less due to any technical limitations CL may have and more because you are imposing limitations carried from a different programming model which rarely appear in practice when using Lisp.
slowcoder wrote: Ok, so now I am writing an example of typical usage of templates in C++.
slowcoder wrote:The code will have some procedures, one of which will be insert, which will insert an element in the tree. insert will in turn call a procedure named insert_fixup, which will mantain the red-black properties, and that one will call both left_rotate and right_rotate, which will do single rotations.

The big point is that insert strongly depends on the ordering of the elements but insert_fixup and the rotations do not. Thus, separting the structures (which will be fed as template arguments to the procedure) allows us to reuse the compiled code for insert_fixup for a tree that sorts ints in increasing and decreasing order, reducing code bloat.
slowcoder wrote:One more example. Suppose we want to compute the gcd of two elements of some euclidean domain. One way would be C++:
slowcoder wrote: In Lisp, we can only hope for doing something like this:
Actually, usually what you describe would be implemented using CLOS. I know that SBCL does pretty serious optimizations, although it doesn't really generate separate code paths. Of course, if, as you say, the code in your C++ example is shared, then it doesn't have separate code paths either.

Code: Select all

  (defgeneric general-zero-p (object))
  (defgeneric general-mod (a b))
  (defun general-gcd (a b)
    ;; code using (general-zero-p a/b) and (general-mod a b)
  )

  ;; example for integers
  ;; in Common Lisp there is a builtin class for integers already
  (defmethod general-zero-p ((a integer))
    (zerop a))
  (defmethod general-mod ((a integer) (b integer))
    (mod a b))
This doesn't really fulfill your conditions, unless there are CLOS implementation much more advanced than I think there are, but I still think this is sufficient for most purposes. There is no point in creating some elaborate system unless performance problems are actually encountered.
slowcoder wrote:#0. Express an algorithm generically over algebraic structures.
This is trivial in Lisp because of dynamic typing, especially using CLOS. Doing so while removing runtime type/class dispatch is more complex. Doing that without code duplication is even more complex.
slowcoder wrote:#1. Define the structures for runtime or compile-time (or both) with uniform syntax.
I am not sure what this means.
slowcoder wrote:#2. Instatiate the algorithms for compile-time/runtime based on my decision with a similar syntax.
This is slightly tricky, because function objects cannot be directly emitted in the compiled code. Although this can be overcome with (load-time-value ...) and some sort of linkage table to names.
slowcoder wrote:#3. Know that the system will instantiate specialized code on demand.
I am not sure how this point is not a subset of the previous one, unless it requires the certainty that code will not be instantiated without request.
slowcoder wrote:#4. Know that code will be shared as needed.
Shared code is by definition a function, so I am not sure how this plays with your requirement to avoid function overhead above.

Returning to my distinction about compile-time, load-time and run-time, this is possible:

Code: Select all

(defparameter *algorithm-cache* (make-hash-table :test 'equal))

(defun runtime-general-algorithm (template-arguments &rest algorithm-arguments)
  (declare (ignore template-arguments algorithm-arguments))
  (print :runtime))

(defun construct-algorithm (template-arguments)
  (cond ((equal template-arguments '(:spec1))
         '(lambda ()
           (print :spec1)))))

(defun get-or-construct-algorithm (template-arguments)
  (let ((cache (gethash template-arguments *algorithm-cache*)))
    (if cache
        cache
        (let ((algorithm (construct-algorithm template-arguments)))
         (if algorithm
             (setf (gethash template-arguments *algorithm-cache*) (compile nil algorithm))
             #'(lambda ()
                 (runtime-general-algorithm template-arguments)))))))

(defmacro general-algorithm ((&rest template-arguments) &rest algorithm-arguments)
  `(funcall (load-time-value (get-or-construct-algorithm ',template-arguments)) ,@algorithm-arguments))

(defun test-algorithm-cache ()
  (general-algorithm (:spec1))
  (general-algorithm (:runtime)))
It is a bit confused, but it's mostly a sketch of an idea. You can hook definitions into the (construct-algorithm ...) function. It is also possible to add optional direct inlining conditional on template arguments in the macro.

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Compile-time information across macros

Post by gugamilare » Tue Aug 25, 2009 12:57 am

Sorry for the typos :?, but I'm glad that you understood my idea.

Well, I've seen the red-black tree example and I think I understand your problem. It is a very interesting one :)

There is a CL project called Spatns which does something similar to this. It creates the kind of sparse tensor (multi-dimensional array) that you want, and create specialized getter and setter functions for it. The drawback for this approach is that it limits the syntax you use, which kinda transforms CL into a statically type language - you need to know the type of the tensor you will use in each problem. As for code sharing, in this example, it wouldn't be hard to create hashtables for the getters and setters, indexed by the relevant arguments given to the macro.

In the RB tree example, you would need to create a specialized syntax for every kind of ordering (#'<, #'>, #'whatever), which, I agree, is not ideal. Correct me if I am wrong, but the compile-time structures you were talking about is information about the variables being used, right? Suppose that this is John's code:

Code: Select all

(let ((rbt (make-red-black-tree :test #'< :key-type 'fixnum)))
  (insert rbt 2)
  (insert rbt 3)
  ...
  rbt)
Then, during compile-time there would be a red-black-tree compile-time structure associated with rbt, and the compilation of insert would fetch this information in order to call the function specialized to rbt, right? In this case, I am afraid that CL does not have this feature and that it also lacks something in this direction. Anything like this would require having access to lexical environments, which, when available, are implementation-dependent.

Well, I have an idea that might work as well for this problem, using some features of MOP. This is just a scratch, I don't know if this will work or how to finish this code. The idea is to create a class named red-black-tree and a hashtable of classes indexed by the test and element-type provided. Then, when the make-red-black-tree is called with constant :test and :element-type arguments (like in John's code), its compiler macro will create one class on the fly and insert it into the hashtable mentioned above. Here is the code to do this:

Code: Select all

(defclass red-black-tree ()
  ((test :initarg :test :accessor red-black-tree-test)
   (element-type :initarg :element-type :accessor red-black-tree-element-type)
   ... ; other relevant fields
   ))

(defun make-red-black-tree (&rest fiels)
  (apply #'make-instance 'red-black-tree fields))

(defvar *red-black-tree-classes* (make-hash-table :test #'equal))

(defun function-form-p (arg)
  (and (consp fun)
       (member (car fun) '(quote function))
       (symbolp (second arg))
       (not (cddr arg))))

(defun quoted-argument-p (arg)
  (and (consp fun)
       (eq (car fun) quote)
       (not (cddr arg))))

(defun get-red-black-tree-class (test element-type)
  (let ((key (list test element-type)))
    (or (gethash key *red-black-tree-classes*)
        (let ((class (closer-mop:ensure-class
                      (gensym) :superclasses (list (find-class 'red-black-tree))
                      ;; other possible fields here (like slots)
                      )))
          (setup-red-black-tree-class class test* element-type*)
          (setf (gethash key *red-black-tree-classes*)
                class)))))

(define-compiler-macro make-red-black-tree (&rest fiels &key test element-type &allow-other-keys)
  (let* ((test* (and (function-form-p test) (second test)))
         (element-type* (and (quoted-argument-p element-type) (second element-type)))
         (class (get-red-black-tree-class test* element-type*)))
    `(make-instance ',class ,@fields)))
I believe that the function setup-red-black-tree-class should create a method for the function insert (and possibly other generic function involved). The problem is that, doing only this, each John's call to insert would be a generic function call, which will probably involve calculating the applicable methods and sorting them in compile-time. Fortunately, some implementations might cache this computation, and so such computation would be made only for the first call of insert in John's code, which is quite acceptable, though some benchmarking on this would be good.

If this is not enough, a nice extension to this approach would be to create a hashtable indexed by the package. Each expansion of the compiler-macro of insert will depend on the package it is being expanded from, and each entry in the hashtable will be an alist which associates a class with the function insert for that class, but only for the classes that were used in that package. So, in John's package, the expansion of insert would be specialized to the types of red-black-tree classes used inside John's package. Like this:

Code: Select all

(defvar *red-black-tree-insert-function* (make-hash-table :test #'eq))

(defun setup-red-black-tree-class (class test element-type)
  (let ((specialized-function (generate-insert-function test element-type)))
    (push (list class specialized-function)
          (gethash *package* *red-black-tree-insert-function*))
    ;; define specialized-function to be a method of the generic function %insert
    ;; specialized to the class here
    ))

(defgeneric %insert ...
  ;; create a generic version of the function insert here
  )

(defun insert (tree elt)
  (%insert tree elt))

(define-compiler-macro insert (tree elt)
  (let ((alist (gethash *package* *red-black-tree-insert-function*)))
    `(once-only (tree elt)
       (typecase ,tree
         ,@(loop for (class function) in alist
              collect `(,class (funcall ,function ,tree ,elt)))
         (t (%insert ,tree ,elt))))))
This code has a great advantage when the compiler has a type inference, like SBCL or CMUCL: since the macro expansion is a call to typecase, if the compiler can deduce by itself the type of the tree being used, the compiler can optimize and only evaluate the specific form for that kind of tree. I made the compiler-macro for make-red-black-tree exactly to allow this, so this would be true in John's code. I don't know how well this code would behave in CCL, ECL or Clisp and other implementations, which might or might not be able to do the same. But using type definition, I believe that this could be solved for some implementations without type inference, not to mention that it would be good where you know the type of some tree which was retrieved by some other function:

Code: Select all

(let ((rbt ...))
  (declare (type #.(get-red-black-tree-class :test '< :element-type 'fixnum) rbt))
  ...)
or

Code: Select all

(deftype tree-<-fixnum ()
  (get-red-black-tree-class :test '< :element-type 'fixnum))

(let ((rbt ...))
  (declare (type tree-<-fixnum rbt))
  ...)
This solution, if implemented, would be fine to me. I apologize if my idea does not suite your problem, or if it isn't complete enough, or if there are any typos ;), or even if I failed to understand your problem. I guess you'll have to feel the various holes in my code. For instance, to be able to create the insert-fixup or rotations (which do not depend on the order test), you may actually create one more table of nodes, which is a hashtable indexed only by the element-type, for instance (gethash *node-table* element-type), and make the tree class inherit from the node class. In particular, if you liked my solution but never used MOP (like me, though I know more or less how it works), then I guess you'll have to learn it (the specification might be enough for this simple task, but you might want to read the book "The Art of Meta-Object Protocol" as well).

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Compile-time information across macros

Post by ramarren » Tue Aug 25, 2009 6:36 am

I have also thought of another way, although it is questionable how good an idea that is. It is always possible to move to a more C-like compilation model. Although lack of direct compiler support for this would require having to compile everything twice, first collecting templates instantiations substituting them with stubs, generating and compiling code for templates removing duplicates, and then recompile everything substituting invocations properly. This is perhaps the technically simplest (not saying it would be easy, since there are probably any number of edge cases I haven't thought of) way of fulfilling all of slowcoders conditions, with the main problem being having to create a custom build system, since ASDF is not going to like it. The stubs would probably also annoy the type inferencer, but that is either ignorable or solvable.

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Compile-time information across macros

Post by gugamilare » Tue Aug 25, 2009 8:27 am

Ramarren wrote:I have also thought of another way, although it is questionable how good an idea that is. It is always possible to move to a more C-like compilation model. Although lack of direct compiler support for this would require having to compile everything twice, first collecting templates instantiations substituting them with stubs, generating and compiling code for templates removing duplicates, and then recompile everything substituting invocations properly. This is perhaps the technically simplest (not saying it would be easy, since there are probably any number of edge cases I haven't thought of) way of fulfilling all of slowcoders conditions, with the main problem being having to create a custom build system, since ASDF is not going to like it. The stubs would probably also annoy the type inferencer, but that is either ignorable or solvable.
Instead of compiling twice, what about creating a code walker? There is even a library for code walking cl-walker, which is based on Arnesi's code walker. Using Arnesi's code walker to change CL to a Lisp-1 syntax was easy enough (and, wow, cl-walker even accesses portably the lexical environment!). A code walker would be the best, it would allow to associate the compile-time structures to the variables and to change the functions to whatever code / function call you want, but it would also be more extensive and require more work.

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Compile-time information across macros

Post by ramarren » Tue Aug 25, 2009 9:09 am

gugamilare wrote:Instead of compiling twice, what about creating a code walker? There is even a library for code walking cl-walker, which is based on Arnesi's code walker. Using Arnesi's code walker to change CL to a Lisp-1 syntax was easy enough (and, wow, cl-walker even accesses portably the lexical environment!). A code walker would be the best, it would allow to associate the compile-time structures to the variables and to change the functions to whatever code / function call you want, but it would also be more extensive and require more work.
I agree that a code walker would be in principle the best solution, although those are not entirely portable. On the other hand this made me realize, that if wrapping DEFUN is a possibility then a custom build system can be avoided, because you can just do a code walk/full compilation and emit the template instantiations locally, but at toplevel. You could even signal from inside the macro, avoiding the need for stubs...

I was thinking in terms of emitting, compiling and loading a different file with template instantiations, since creating compile time functions not from toplevel is problematic. I am not sure how exactly does non-toplevel defun work, and dumping anonymous functions into fasls doesn't work at all. Using a different file for them and making two passes over the system would allow using perfectly normal DEFUN etc. in the source.

slowcoder
Posts: 4
Joined: Sat Aug 15, 2009 4:20 pm

Re: Compile-time information across macros

Post by slowcoder » Tue Aug 25, 2009 9:43 am

In the RB tree example, you would need to create a specialized syntax for every kind of ordering (#'<, #'>, #'whatever), which, I agree, is not ideal. Correct me if I am wrong, but the compile-time structures you were talking about is information about the variables being used, right? Suppose that this is John's code:
Not exactly that. The compile-time structure is actually the cache to hold template instatiations.

Thanks, Ramarren, for your explanation. I think I got a feel for how Lisp systems actually work. That fboundp has to incur some runtime overhead is similar to C's link-table, and that is both necessary and acceptable. I think I will try to use this strategy to implement my system, it seems reasonable. Thank you.

Actually, thanks for everybody's replies, I really appreciated it, you were pretty receptive.

Post Reply