Lisp newbie trying to optimize recursive algorithm, help?

tensorproduct · Post by **tensorproduct** » Mon Jun 18, 2012 2:01 pm

Hi

I've written an algorithm to count the number of inversions in a list of integers (pairs of integers that are out of order), but it runs very slowly. It takes ~400 seconds to run on a list of 100,000 integers. In contrast, the same algorithm written in Java runs in less than a second.

The algorithm itself is basically merge-sort, a binary recursive call, but counting the number of merges that would need to be done without actually doing them.

This is my basic Lisp form:

Code: Select all

(defun count-inversions (lst &optional (predicate #'<))
  (if (or (null lst) (null (cdr lst))) ;recursive base case
      0
      (let* ((half (ceiling (/ (length lst) 2)))
	     (left-list (subseq lst 0 half))
	     (right-list (subseq lst half)))
	(+ (loop for a in left-list ; loop over the left and right halves of the list
	      summing (loop for b in right-list
			 counting (not (funcall predicate a b)))) ; count inversions between left and right lists
	   (count-inversions left-list predicate)
	   (count-inversions right-list predicate))))) ;recur on left and right lists

And this is that same form with various type declarations and optimization, it shaves about sixty seconds of the running time, but it still takes almost six minutes:

Code: Select all

(defun count-inversions (lst &optional (predicate #'<))
  (declare (optimize (speed 3)
		     (compilation-speed 0)
		     (debug 0)
		     (safety 0)))
  (declare (type list lst))
  (declare (type (function (fixnum fixnum)) predicate))
  (if (or (null lst) (null (cdr lst)))
      0
      (let* ((half (the fixnum (ceiling (/ (the fixnum (length lst)) 2))))
	     (left-list (subseq lst 0 half))
	     (right-list (subseq lst half)))
	(declare (type fixnum half))
	(declare (type list left-list))
	(declare (type list right-list))
	(+ (the bignum (loop for a in left-list
			  summing (loop for b in right-list
				     counting (not (funcall
						    (the function predicate)
						    (the fixnum a)
						    (the fixnum b))))))
	   (the bignum (count-inversions left-list predicate))
	   (the bignum (count-inversions right-list predicate))))))

I'm using SBCL (+emacs +slime) on Ubuntu.

Why does this take so long? Is there anyway to get the running time comparable to Java? I have read that properly optimized CL can be competitive with C, so I would hope that this algorithm can be made better than a hundred times slower than Java.

Any help is appreciated,
tensorproduct

wvxvw · Post by **wvxvw** » Tue Jun 19, 2012 12:38 am

Code: Select all

(defun count-inversions-1 (x &optional (predicate #'<) (len (length x)))
  (if (or (null x) (null (cdr x)))
      0
      (let* ((half-a (1- (ash len -1)))
             (half-b (- len half-a))
             (right-list (split-in-two! x)))
        (+ (loop for a in x
                summing (loop for b in right-list
                             counting (not (funcall predicate a b))))
           (count-inversions-1 x predicate half-a)
           (count-inversions-1 right-list predicate half-b)))))

(defun split-in-two! (x &optional (len (length x)))
  (setq len (1- (ash len -1)))
  (do ((c x (cdr c))
       (r)
       (i 0 (1+ i)))
      (nil)
    (when (= i len)
      (setq r (cdr c))
      (rplacd c nil)
       (return r))))

Could you try it with something like this? I'm not 100% certain I understood what exactly the function has to do, it looks like reduce'ing instead of looping might do a better job (although insignificantly, I guess). The major problem with your algorithm is that it:
- intensively recalculates the length of the lists many-many times, and it's a O(n), while in Java you probably used ArrayList (which, contrary to its name is a dynamic array, nothing to do with lists), or just an array of int (even faster), and "calculating its length is O(1).
- creates a lot of needless conses. subseq creates new conses - but this is what you don't want to do! You've no use for the old conses after new are created, and, although the runtime may try to be smart and reduce the conses creation by reusing some old ones you threw away, it will be still taxing the memory. However, splitting arrays in two is a much simpler task then splitting lists (again O(1) vs O(n), you could say it's O(n/2) but there's no such thing). And you had to call subseq twice!

Alright, I'm hoping that the above will improve your code somewhat, but, ultimately, if you want to get near Java's speed on this task you need to use arrays, not lists, because arrays are better for this algorithm.

tensorproduct · Post by **tensorproduct** » Tue Jun 19, 2012 1:46 am

Thanks wvxvw. I think that you're right. I hadn't appreciated the difference between a lisp list and an array. I'll modify my code to work on an integer array (for starters) and see what sort of improvement that yields. I'm guessing that it will be significant.

Is there any simple method to convert a list to an array? Some thing like (vector *input-list*), I'm guessing.

Cheers.

pjstirling · Post by **pjstirling** » Tue Jun 19, 2012 2:51 am

tensorproduct wrote:Thanks wvxvw. I think that you're right. I hadn't appreciated the difference between a lisp list and an array. I'll modify my code to work on an integer array (for starters) and see what sort of improvement that yields. I'm guessing that it will be significant.

Is there any simple method to convert a list to an array? Some thing like (vector *input-list*), I'm guessing.

Cheers.

Code: Select all

(COERCE #(1 2 3) 'LIST)

wvxvw · Post by **wvxvw** » Tue Jun 19, 2012 9:42 am

pjstrling, almost, but you did it backwards

you are coercing an array to list. You wanted

Code: Select all

(coerce '(1 2 3) 'vector)

or

Code: Select all

(make-array 3 :element-type 'fixnum :initial-contents '(1 2 3))

I would prefer the second way because it allows for more precise type specification.

Konfusius · Post by **Konfusius** » Tue Jun 19, 2012 4:32 pm

The funcall to predicate is a huge timesink.

I tested your program with Clozure CL v1.6 under Windows and without the funcall to predicate it took 0.205 seconds for a list of 10.000 elements.
I also rewrote your program in C with Visual C++ 6 using a static array and it took 0.175 seconds. I doubt Java would make 100.000 elements in less than a second.

tensorproduct · Post by **tensorproduct** » Wed Jun 20, 2012 6:06 am

Thanks for the replies guys.

Moving from a list to an array didn't quite have the impact I was expecting, maybe 10% off the running.

Taking out the funcall to a predicate on the other hand made all the difference. It brings the running time down to 25 seconds, a huge speed up.

So, does this mean that writing fast Lisp code is mutually exclusive to passing first class functions? Or are there other ways of doing this than "funcall"-ing things?

Konfusius wrote: I doubt Java would make 100.000 elements in less than a second.

I admit that I quoted that figure without doing any proper timing. 3 to 4 seconds is a more accurate quotation.

Post by **ramarren** » Wed Jun 20, 2012 7:16 am

Actually, a lot of the problem here is not with FUNCALL itself but with the fact that FUNCALL screens optimizations. If you pass as a predicate not #'<, which is generic multi-argument function, but (lambda (a b) (declare (fixnum a b)) (< a b)) it will reduce the time by more that half alone.

Of course for a very small operation like numeric comparison the cost of indirection will always be significant. A first class function shouldn't be an innermost operation in a long loop. There are ways around this (for example, use a compiler macro to catch a compile-time predicate or recompile at runtime when the predicate is available), but it should be done only when you know that it would be actually useful, and if it is impractical to pass a larger behaviour.

Konfusius · Post by **Konfusius** » Wed Jun 20, 2012 10:08 am

tensorproduct wrote:So, does this mean that writing fast Lisp code is mutually exclusive to passing first class functions? Or are there other ways of doing this than "funcall"-ing things?

It's all about the implementation. The Lisp implementations I know (SBCL and Clozure CL for Windows) generate abyssmal code for funcalls of functions that they cannot determine at compile time. I don't know if the commercial implementations (Allegro/Lisp Works) are better at this.

tensorproduct wrote:Moving from a list to an array didn't quite have the impact I was expecting, maybe 10% off the running.

This isn't a surprise to me. The purpose of arrays is to speed up code but ironically untyped arrays are horribly slow in Common Lisp. Typed arrays should be fast but SBCL and Clozure CL don't seem to be able to make proper use of that type information. Arrays seem to be pretty much useless in the free implementations. Lists are amazingly fast, though.

tensorproduct wrote:I admit that I quoted that figure without doing any proper timing. 3 to 4 seconds is a more accurate quotation.

I doubt that, too. My C implementation took 17.625 seconds. (VC++6 doesn't generate the best code, though) Here is the C source:

Code: Select all

#include <windows.h>
#include <stdio.h>

#define NUM 100000

int a[NUM];

int
count_inversions (int start, int end)
{
  int cnt=0,i,j,half=start+(end-start)/2;

  if(end-start<2) return 0;

  for(i=start; i<half; i++)
	for(j=half; j<end; j++)
	  if(a[i]>a[j])
		++cnt;

  return cnt+count_inversions(start,half)+count_inversions(half,end);
}

int
main (int argc, char **argv)
{
  int i,cnt;
  DWORD t;

  for(i=0; i<NUM; i++) a[i]=NUM-i;

  t=GetTickCount();
  cnt=count_inversions(0,NUM);
  printf("time=%f seconds\n",(double)(GetTickCount()-t)/1000E0);
}

PS: Maybe this post sounds a bit more negative about Lisp than I intended. In fact, your original code (without a funcalled predicate) runs only ~15% slower (in Clozure CL v1.6) than the C version on my PC. I think thats pretty impressive for a dynamic language, especially since your code uses lists and does a lot of consing due to subseq, while the C version uses a static array.

That's why I like Lisp. It's powerful and still pretty fast. Even without arrays.

wvxvw · Post by **wvxvw** » Wed Jun 20, 2012 2:29 pm

I decided to do some full-scale testing and experimenting with your example, and, as others have said that...

The most time consuming operation appears to be the funcall.

http://pastebin.com/J26Nt1TV

here are the results. It may be possible to optimize it further a bit by manipulating pointers to the segments of the array being compared instead of dividing the array, but it appears that memory allocations are rather super fast, so that's not important, unless memory itself is important.

And... by thee wee, the results for my last example (without funcall) are:

Code: Select all

Evaluation took:
  17.825 seconds of real time
  17.777111 seconds of total run time (17.777111 user, 0.000000 system)
  99.73% CPU
  49,873,143,567 processor cycles
  18,720,528 bytes consed

That's pretty close to C!

(but it can be even better)

Code: Select all

Processor	8x Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz
Memory	6127MB (4327MB used)
Operating System	Debian GNU/Linux 6.0.4

* just in case...

LispForum

Lisp newbie trying to optimize recursive algorithm, help?

Lisp newbie trying to optimize recursive algorithm, help?

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help

Re: Lisp newbie trying to optimize recursive algorithm, help