imba wrote:So what is faster: remove-duplicates or Warren Wilkinson's approach?
Some naive testing suggests that REMOVE-DUPLICATES is significantly faster (0.03s vs. 11s for 100,000 items generated by RANDOM). In fact, it even seems to handle 1,000,000 items much better than I expected. One has to be
very careful with such microbenchmarks, though, especially on SBCL, whose compiler is wickedly smart.
What's really weird, though, is that DELETE-DUPLICATES is
slower than REMOVE-DUPLICATES by a couple of orders of magnitude. A bug in this version of SBCL maybe?
It probably won't make much of a difference if your input list is fewer than a couple of thousand items long, anyway. You know what they say about premature optimisation...
Code: Select all
CL-USER> (time (remove-duplicates (loop for x from 1 to 100000 collect (random 1000000000))))
Evaluation took:
0.027 seconds of real time
0.027461 seconds of total run time (0.025240 user, 0.002221 system)
100.00% CPU
...
(960529751 650859964 696275401 493130535 258563691 736847323 858363769 45617352
165142801 383460513 ...)
Code: Select all
CL-USER> (time
(loop with collected-items = (make-hash-table)
for i from 1 to 100000
for x = (random 1000000000)
unless (gethash x collected-items nil)
do (setf (gethash x collected-items) t)
and collect x))
Evaluation took:
0.049 seconds of real time
0.047696 seconds of total run time (0.042030 user, 0.005666 system)
[ Run times consist of 0.010 seconds GC time, and 0.038 seconds non-GC time. ]
97.96% CPU
...
(238510722 35471481 364524553 593573298 940395904 417695290 814711312 20768734
121981963 754212048 ...)
Code: Select all
CL-USER> (time
(loop with collected-items = (fset:set)
for i from 1 to 100000
for x = (random 1000000000)
unless (fset:contains? collected-items x)
do (fset:adjoinf collected-items x)
and collect x))
Evaluation took:
0.491 seconds of real time
0.478523 seconds of total run time (0.424767 user, 0.053756 system)
[ Run times consist of 0.170 seconds GC time, and 0.309 seconds non-GC time. ]
97.56% CPU
...
(546310694 336599977 128592735 175916988 847899506 834855340 777332112
579897313 166227745 363405174 ...)
Code: Select all
CL-USER> (time
(loop with result = nil
for x from 1 to 100000
do (pushnew (random 1000000000) result)
finally (return result)))
Evaluation took:
11.351 seconds of real time
11.245011 seconds of total run time (11.209439 user, 0.035572 system)
99.07% CPU
...
(129996452 83142730 830379502 254691783 321222762 91109236 181730054 188146466
486221870 191407494 ...)
Code: Select all
CL-USER> (time (delete-duplicates (loop for x from 1 to 100000 collect (random 1000000000))))
Evaluation took:
98.663 seconds of real time
98.295298 seconds of total run time (98.177997 user, 0.117301 system)
99.63% CPU
...
(515034309 845470080 999353875 760234271 406270440 283415072 245652264
343700715 42652215 722056408 ...)