Generalized iterators in CL ?

Discussion of Common Lisp
Post Reply
sinnatagg
Posts: 29
Joined: Tue Apr 21, 2009 3:04 am

Generalized iterators in CL ?

Post by sinnatagg » Sat Apr 03, 2010 2:35 am

Ayup! I'm implementing a couple of part-of-speech taggers in Common Lisp, and one of the things I want to the is to decouple the traversal of text collections from the tagging/training code.

Ideally I would like something like this:

Code: Select all

(loop for sent in sentences do (...))
Where sentences is some arbitrary structure implementing an iterator and sent is a sentences structure.

We all know loop can't do this, so I've been looking at libraries like series and iterate. But before I dive to deeply into those I would like to hear your opinions about this; what is the best way for a data consumer - like the tagger/trainer - to consume (possibly millions) of data points from a data provider - like text collections - in a reasonably general way ?


Snakkes!

-andré

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Generalized iterators in CL ?

Post by gugamilare » Sat Apr 03, 2010 8:33 am

You might want to take a look at this paper. It describes a standard that generalizes the concept of sequence in common-lisp implementations. Some implementations support it, some not; all I know is that SBCL supports it, and it seems Clisp does not (as of 2.48).

Paul
Posts: 106
Joined: Tue Jun 02, 2009 6:00 am

Re: Generalized iterators in CL ?

Post by Paul » Wed Apr 07, 2010 10:04 pm

sinnatagg wrote:Ayup! I'm implementing a couple of part-of-speech taggers in Common Lisp, and one of the things I want to the is to decouple the traversal of text collections from the tagging/training code.

Ideally I would like something like this:

Code: Select all

(loop for sent in sentences do (...))
Where sentences is some arbitrary structure implementing an iterator and sent is a sentences structure.

We all know loop can't do this, so I've been looking at libraries like series and iterate.
I don't know it...why can't loop do this? It does it just fine for me! Not using the "in" keyword, but you can probably write a loop path that does whatever you want if you write something like

Code: Select all

(loop for sent being the values in sentences do (...))

nuntius
Posts: 538
Joined: Sat Aug 09, 2008 10:44 am
Location: Newton, MA

Re: Generalized iterators in CL ?

Post by nuntius » Wed Apr 07, 2010 10:26 pm

Paul, LOOP can't be extended to handle "arbitrary structures". For example, it doesn't know how to iterate the sets in fset.

Jasper
Posts: 209
Joined: Fri Oct 10, 2008 8:22 am
Location: Eindhoven, The Netherlands
Contact:

Re: Generalized iterators in CL ?

Post by Jasper » Thu Apr 08, 2010 6:49 am

Gah which retard made the shortcuts for Firefox.. Just accidentally hit C-W instead of C-w and lost my post. (Same for C-q)

I use LOOP for convenience now and then, but find it too irregular. However if you want to extend something like loop, Iterate can probably do it, probably will need to be a driver.

How about just (map-sentences do-fun &rest sentence-seqs) and if you don't like to write Lambda all the time:

Code: Select all

(defmacro do-sentences ((&rest var-list) &body body)
  `(map-sentences (lambda ,(mapcar #'car var-list) ,@body) ,@(mapcar #'cadr var-list)))
Or perhaps construct these from manually iterating with Next-sentence Prev-sentence aswel. If you think things will get too nested, denest.

Paul
Posts: 106
Joined: Tue Jun 02, 2009 6:00 am

Re: Generalized iterators in CL ?

Post by Paul » Fri Apr 09, 2010 12:50 am

nuntius wrote:Paul, LOOP can't be extended to handle "arbitrary structures". For example, it doesn't know how to iterate the sets in fset.
I don't know how to iterate the sets in fset, either, never having used it. But let's say you have a function, ITERATOR, that you can call on an object to get an iterator for that object, and then calling ITER-NEXT on that iterator gets subsequent values, until it hits the end and throws ITER-DONE. Then you can write something like

Code: Select all

(defun loop-iterator-iteration-path (var dtype pps)
  (when (cdr pps)
    (ansi-loop::loop-error "Only expecting one prepositional phrase here."))
  (unless (member (caar pps) '(:of :in))
    (ansi-loop::loop-error "Unknown preposition: ~S." (cadr pps)))
  (let ((iter (ansi-loop::loop-gentemp 'iter-))
	(tmp1 (ansi-loop::loop-gentemp 'done-))
	(tmp2 (ansi-loop::loop-gentemp 'val-)))
    `(((,iter (iterator ,(cadar pps)))
       (,var nil ,dtype))
      ()
      ()
      (,var (let* ((,tmp1 t)
		   (,tmp2 (catch 'iter-done
			   (prog1 (iter-next ,iter) (setq ,tmp1 nil)))))
	      (if ,tmp1 (ansi-loop::loop-finish) ,tmp2)))
      ()
      ())))

(ansi-loop::add-loop-path '(value values)
			  'loop-iterator-iteration-path
			  ansi-loop::*loop-ansi-universe*
			  :preposition-groups '((:of :in))
			  :inclusive-permitted nil)
And now (loop for x being the values of y ...) will loop over the values produced by iter-next on (iterator y). Make (iterator y) do whatever you have to do to iterate over an fset, and LOOP will iterate over fsets.

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Generalized iterators in CL ?

Post by gugamilare » Fri Apr 09, 2010 4:41 am

Paul wrote:
nuntius wrote:Paul, LOOP can't be extended to handle "arbitrary structures". For example, it doesn't know how to iterate the sets in fset.
I don't know how to iterate the sets in fset, either, never having used it. But let's say you have a function, ITERATOR, that you can call on an object to get an iterator for that object, and then calling ITER-NEXT on that iterator gets subsequent values, until it hits the end and throws ITER-DONE. Then you can write something like

Code: Select all

(defun loop-iterator-iteration-path (var dtype pps)
  (when (cdr pps)
    (ansi-loop::loop-error "Only expecting one prepositional phrase here."))
  (unless (member (caar pps) '(:of :in))
    (ansi-loop::loop-error "Unknown preposition: ~S." (cadr pps)))
  (let ((iter (ansi-loop::loop-gentemp 'iter-))
	(tmp1 (ansi-loop::loop-gentemp 'done-))
	(tmp2 (ansi-loop::loop-gentemp 'val-)))
    `(((,iter (iterator ,(cadar pps)))
       (,var nil ,dtype))
      ()
      ()
      (,var (let* ((,tmp1 t)
		   (,tmp2 (catch 'iter-done
			   (prog1 (iter-next ,iter) (setq ,tmp1 nil)))))
	      (if ,tmp1 (ansi-loop::loop-finish) ,tmp2)))
      ()
      ())))

(ansi-loop::add-loop-path '(value values)
			  'loop-iterator-iteration-path
			  ansi-loop::*loop-ansi-universe*
			  :preposition-groups '((:of :in))
			  :inclusive-permitted nil)
And now (loop for x being the values of y ...) will loop over the values produced by iter-next on (iterator y). Make (iterator y) do whatever you have to do to iterate over an fset, and LOOP will iterate over fsets.
What implementation are you using? Because this is clearly not ANSI, which means some implementations will have this functionality, some won't.

I am actually surprised to see that SBCL has those functions in the package sb-loop (add-loop-path, loop-error and gentemp (instead of loop-gentemp)). Maybe it is so because you are using CMUCL (and those two implementations have a common root)? On the other hand, Clisp doesn't seem to have any of this.

Not to mention you are using internal functions, which can change at the developers' will.

Paul
Posts: 106
Joined: Tue Jun 02, 2009 6:00 am

Re: Generalized iterators in CL ?

Post by Paul » Fri Apr 09, 2010 7:21 am

I don't know what loop code CLISP uses. Most implementations can do this. (Any reason you can't compile the MIT loop code on CLISP, if you want?)
gugamilare wrote:Not to mention you are using internal functions, which can change at the developers' will.
Haha. It hasn't changed in...umm...20-odd years; I doubt it's going to change in the next few weeks :) {Besides, they're not really "internal internal"; they're put there to allow extension}

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Generalized iterators in CL ?

Post by gugamilare » Sat Apr 10, 2010 5:36 am

Paul wrote:I don't know what loop code CLISP uses. Most implementations can do this. (Any reason you can't compile the MIT loop code on CLISP, if you want?)
Ok, I could, but then I would have to distribute MIT loop code with my libraries, which is not very nice. I would rather have my project depending on iterate, for instance.

And what source do you have to know that most implementations can do this? And by saying that most implementations can do this, do you mean they can do this in the same way, or in different ways? Because having the same feature among many implementations but varying on the way that that feature is used... it is not good enough for me.

If there was a standard, though, and implementations followed that standard, or at least a compatibility layer, then that would be enough.
Paul wrote:
gugamilare wrote:Not to mention you are using internal functions, which can change at the developers' will.
Haha. It hasn't changed in...umm...20-odd years; I doubt it's going to change in the next few weeks :) {Besides, they're not really "internal internal"; they're put there to allow extension}
Well, they did change in SBCL, at least the function names and the package in which they are located. This functionality is good for people who want to use them and don't care much about the portability of their code (e.g. when sticking to one implementation), but that is not everyone.

If it makes you happy, though, good for you ;)

Post Reply