Page 1 of 1

collecting *unique* in loop

Posted: Thu Nov 25, 2010 9:26 am
by imba
Hi.

In the following situation

Code: Select all

  (loop for a in l
        when cond
        collecting a)
How do I insure that every a is collected only *once*. Note: I don't want to apply something as "unique" to the result, but not to collect doublettes in the first time.

Re: collecting *unique* in loop

Posted: Thu Nov 25, 2010 10:44 am
by gugamilare
Use the function remove-duplicates.

Re: collecting *unique* in loop

Posted: Thu Nov 25, 2010 11:10 am
by imba
That isn't what I want. I want not to collect then in the first place.

Re: collecting *unique* in loop

Posted: Thu Nov 25, 2010 12:18 pm
by Kompottkin
In that case, you will have to remember the set of collected items somehow. Like this, for example:

Code: Select all

CL-USER> (loop with collected-items = (fset:set)
               for x in '(1 2 3 1 2 4)
               unless (fset:contains? collected-items x)
                 do (fset:adjoinf collected-items x)
                 and collect x)
(1 2 3 4)
(This example makes use of the FSet library.)

Re: collecting *unique* in loop

Posted: Thu Nov 25, 2010 4:06 pm
by Warren Wilkinson

Code: Select all

(loop with result = nil
        for a in '(a b a c a d a b a c a d)
        do (pushnew a result)
        finally (return (nreverse result)))

;; result ==> (a b c d)
If you don't care about order, you don't need nreverse at the end.

Re: collecting *unique* in loop

Posted: Fri Nov 26, 2010 7:24 am
by Kompottkin
Warren Wilkinson wrote:

Code: Select all

(loop with result = nil
      for a in '(a b a c a d a b a c a d)
      do (pushnew a result)
      finally (return (nreverse result)))
Yes, that's a simpler approach. It does have run time quadratic in the number of distinct items, though. This may or may not be an issue, of course, depending on the situation.

Re: collecting *unique* in loop

Posted: Fri Nov 26, 2010 10:47 am
by imba
Thank you very much! So what is faster: remove-duplicates or Warren Wilkinson's approach?

Re: collecting *unique* in loop

Posted: Fri Nov 26, 2010 3:31 pm
by Kompottkin
imba wrote:So what is faster: remove-duplicates or Warren Wilkinson's approach?
Some naive testing suggests that REMOVE-DUPLICATES is significantly faster (0.03s vs. 11s for 100,000 items generated by RANDOM). In fact, it even seems to handle 1,000,000 items much better than I expected. One has to be very careful with such microbenchmarks, though, especially on SBCL, whose compiler is wickedly smart.

What's really weird, though, is that DELETE-DUPLICATES is slower than REMOVE-DUPLICATES by a couple of orders of magnitude. A bug in this version of SBCL maybe?

It probably won't make much of a difference if your input list is fewer than a couple of thousand items long, anyway. You know what they say about premature optimisation...

Code: Select all

CL-USER> (time (remove-duplicates (loop for x from 1 to 100000 collect (random 1000000000))))
Evaluation took:
  0.027 seconds of real time
  0.027461 seconds of total run time (0.025240 user, 0.002221 system)
  100.00% CPU
  ...
(960529751 650859964 696275401 493130535 258563691 736847323 858363769 45617352
 165142801 383460513 ...)

Code: Select all

CL-USER> (time
          (loop with collected-items = (make-hash-table)
                for i from 1 to 100000
                for x = (random 1000000000)
                unless (gethash x collected-items nil)
                  do (setf (gethash x collected-items) t)
                  and collect x))
Evaluation took:
  0.049 seconds of real time
  0.047696 seconds of total run time (0.042030 user, 0.005666 system)
  [ Run times consist of 0.010 seconds GC time, and 0.038 seconds non-GC time. ]
  97.96% CPU
  ...
(238510722 35471481 364524553 593573298 940395904 417695290 814711312 20768734
 121981963 754212048 ...)

Code: Select all

CL-USER> (time
          (loop with collected-items = (fset:set)
                for i from 1 to 100000
                for x = (random 1000000000)
                unless (fset:contains? collected-items x)
                  do (fset:adjoinf collected-items x)
                  and collect x))
Evaluation took:
  0.491 seconds of real time
  0.478523 seconds of total run time (0.424767 user, 0.053756 system)
  [ Run times consist of 0.170 seconds GC time, and 0.309 seconds non-GC time. ]
  97.56% CPU
  ...
(546310694 336599977 128592735 175916988 847899506 834855340 777332112
 579897313 166227745 363405174 ...)

Code: Select all

CL-USER> (time
          (loop with result = nil
                for x from 1 to 100000
                do (pushnew (random 1000000000) result)
                finally (return result)))
Evaluation took:
  11.351 seconds of real time
  11.245011 seconds of total run time (11.209439 user, 0.035572 system)
  99.07% CPU
  ...
(129996452 83142730 830379502 254691783 321222762 91109236 181730054 188146466
 486221870 191407494 ...)

Code: Select all

CL-USER> (time (delete-duplicates (loop for x from 1 to 100000 collect (random 1000000000))))
Evaluation took:
  98.663 seconds of real time
  98.295298 seconds of total run time (98.177997 user, 0.117301 system)
  99.63% CPU
  ...
(515034309 845470080 999353875 760234271 406270440 283415072 245652264
 343700715 42652215 722056408 ...)

Re: collecting *unique* in loop

Posted: Sat Nov 27, 2010 2:07 am
by Warren Wilkinson
Its funny you mention the destructive version being slower. In indianerrostock's post viewtopic.php?f=2&t=901 he invited others to try to beat his solution to a problem of replacing "#NAME#" tokens in a template string with the names of sports, repeated several times.

His initial solution of (search)ing for "#NAME#" and outputting a new string was many times faster than my highly optimized version that pre-recorded the "#NAME#" locations and destructively modified a single string. Mind blowing stuff.

On topic though, if your list was sorted you can do faster duplicate detection, but it still might not be faster than remove-duplicates, which is a function I actually didn't know existed... I used

Code: Select all

(defun unique (list) (reduce #'adjoin list :initial-element nil))