General question about Garbage Collection

Whatever is on your mind, whether Lisp related or not.
schoppenhauer
Posts: 99
Joined: Sat Jul 26, 2008 2:30 pm
Location: Germany
Contact:

General question about Garbage Collection

Post by schoppenhauer » Tue Feb 03, 2009 10:38 am

Since many different Language-Interpreters and other software uses Garbage-Collectors, and since Garbage Collectors usually produce Overhead, running more of them can make a System slower. For example, when running a Java VM, perl and SBCL for example, at the same time, on a small VPS, etc., any of these systems has its own garbage collector, reserving a lot more space than actually is needed to optimize runtime-speed, but taking this place from other processes. There are possibilities to leave non-used parts of virtually allocated space non-allocated (like for example setting /proc/sys/vm/overcommit_memory to 1), but these are dangerous, if a process really needs that much memory, and they change the behaviour of paging of the kernel, and hence as far as I know they can make a garbage collector even slower (maybe I am wrong, I dont know exactly).

So what I am wondering since a long time now is, whether it is possible to create a system-wide garbage-collector. I dont want to do this, and I am sure, if it was possible, someone else would have done it, but I just wonder, why it is not possible.

The naive way would be to specify some "protocol" how to access a huge shm, how to save structs, how to specify where the references to other objects are, and where boundings are, with one process running in the background rearranging all the memory. This would be very insecure since all processes could access all the other memory, but the kernel could block access to other parts of the memory. On the other hand, leave all of that work to the kernel, just define a syscall that tells the kernel "i need ... space to save all my objects" and then the kernel rearranges the space and translates addresses.

Ok, there are a lot of different types of garbage collection algorithms, but a "general" solution for all programs always has drawbacks. At least Systems like Java and Common Lisp wouldnt have to start their own garbage collector.
Sorry for my bad english.
Visit my blog http://blog.uxul.de/

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: General question about Garbage Collection

Post by ramarren » Tue Feb 03, 2009 11:36 am

schoppenhauer wrote:So what I am wondering since a long time now is, whether it is possible to create a system-wide garbage-collector. I dont want to do this, and I am sure, if it was possible, someone else would have done it, but I just wonder, why it is not possible.
It is not impossible, and has been done in the past. Most relevant example would be later models of Lisp machines. But Lisp machines mostly disappeared in the early nineties, and dominating systems since have been written in C, which doesn't yield itself to garbage collection very well. And now there is too much legacy code and too many memory layout standards to retrofit system-wide garbage collection onto.

schoppenhauer
Posts: 99
Joined: Sat Jul 26, 2008 2:30 pm
Location: Germany
Contact:

Re: General question about Garbage Collection

Post by schoppenhauer » Tue Feb 03, 2009 12:56 pm

Hm. I actually dont mean that any program must use it. I meant just as an Option for Systems running many Programs using a GC, and then, just for the Programs that really support it. So, having Processes with no memory management which do not use anything else, and having processes where the kernel takes care of their memory management.
Sorry for my bad english.
Visit my blog http://blog.uxul.de/

Harleqin
Posts: 71
Joined: Wed Dec 17, 2008 5:18 am
Location: Bonn, Germany

Re: General question about Garbage Collection

Post by Harleqin » Wed Feb 04, 2009 12:49 am

Ramarren wrote:
schoppenhauer wrote:So what I am wondering since a long time now is, whether it is possible to create a system-wide garbage-collector. I dont want to do this, and I am sure, if it was possible, someone else would have done it, but I just wonder, why it is not possible.
It is not impossible, and has been done in the past. Most relevant example would be later models of Lisp machines. But Lisp machines mostly disappeared in the early nineties, and dominating systems since have been written in C, which doesn't yield itself to garbage collection very well. And now there is too much legacy code and too many memory layout standards to retrofit system-wide garbage collection onto.
I am always dreaming of the "return of the lisp machines". It should be possible to build a Lisp-based system on current hardware, perhaps using Movitz. Lisp would be the system language, just like C is in current Unices. In such an environment, garbage collection would be a natural component of the kernel.
"Just throw more hardware at it" is the root of all evil.
Svante

schoppenhauer
Posts: 99
Joined: Sat Jul 26, 2008 2:30 pm
Location: Germany
Contact:

Re: General question about Garbage Collection

Post by schoppenhauer » Wed Feb 04, 2009 8:33 am

Movitz... Is interesting, but its hard to run it, even in a VM like QEMU or VBox... And certainly noone will write drivers for it, nore usable programs. Hence, it will stay "geek"-software, I think. I dont know much about system programming, but maybe using a Linux Kernel (which has a lot of drivers) and optionally some Lisp-Based system cooperatively (such that one can use Linux-Drivers, too, if nothing was written in Lisp) would be possible? Or just a Linux-System with a Lisp-init-Process?

But still, this is not what I mean. There are a lot of libraries out there written in other languages which may also use a Garbage Collector. I think of a more general approach.
Sorry for my bad english.
Visit my blog http://blog.uxul.de/

findinglisp
Posts: 447
Joined: Sat Jun 28, 2008 7:49 am
Location: Austin, TX
Contact:

Re: General question about Garbage Collection

Post by findinglisp » Wed Feb 04, 2009 10:05 am

Yes, of course that's possible (something like "libgc"), but probably not very practical for the near term. There are a couple of issues that I can think of that push back on the adoption of such a system:

1. An immediate issue right now is that systems like Linux and Windows enforce a process boundary between memory spaces. If you're looking for a system-wide GC, you'd have to implement that using shared memory or something such that all the memory using GC was in a single large pool. If you don't have the single large pool, you don't solve your issue with memory efficiency. On the other hand, the single large pool opens you up to cross-process memory corruption and crashes. People mentioned Lisp machines previously. Lisp machines did in fact have a huge global memory space (with some notion of areas), but all the software was basically written in Lisp or with another compiler that understood the underlying operating system conventions and would do the right thing. Thus, you were protected.

2. And, of course, you'd have to rewrite every program that now uses GC (Lisp, Python, Perl, Ruby, etc.) to use your new GC instead of whatever algorithm it was using. That's simply not going to happen without everybody rallying around and making it so.

So, possible? Yes, definitely. Should you hold you breath for this as a solution to your VPS memory constraints? Probably not. 8-)
Cheers, Dave
Slowly but surely the world is finding Lisp. http://www.findinglisp.com/blog/

Exolon
Posts: 49
Joined: Sat Jun 28, 2008 12:53 pm
Location: Ireland
Contact:

Re: General question about Garbage Collection

Post by Exolon » Wed Feb 04, 2009 6:21 pm

schoppenhauer wrote:Since many different Language-Interpreters and other software uses Garbage-Collectors, and since Garbage Collectors usually produce Overhead, running more of them can make a System slower. For example, when running a Java VM, perl and SBCL for example, at the same time, on a small VPS, etc., any of these systems has its own garbage collector, reserving a lot more space than actually is needed to optimize runtime-speed, but taking this place from other processes.
Garbage collectors do produce an overhead, but I'm pretty sure that the overhead is mostly proportional to the amount of garbage to collect. If we have process p1 with heap h1, then (p2,h2), (p3,h3) etc, then the notional 'global' garbage collector will still need to collect the garbage in h1...hn.
What could really be saved by a unified, global GC? Certainly there is extra overhead (just like visiting 10 houses and washing the dishes in each house in that sink would take somewhat more effort than all 10 houses putting their dirty dishes into one (huge) shared sink so that you can wash them in one sitting), but I suspect the extra overhead is quite small in comparison to the total GC work - and there are surely some advantages as well (for example, if p5 is constantly blocking for user I/O or for whatever reason generates very little garbage, its individual GC would probably be called much less often; but one big global GC wouldn't be able to take advantage of that knowledge (at least, if it's cross-language)).
Also, thread safety/locking and such might slow down the shared heap/GC solution.

[edit]
Oh, and on the subject of "reserving a lot more space than actually is needed", this problem would be more complicated in a global GC, with an unpredictable number of processes running, each with unknowable memory requirements. The cost of guessing wrong could impact all processes. So it's no less an issue in this case.

* note, I've never written a GC nor read any books on the topic, so take my opinions with a grain of salt :D I'm merely interested in this topic as well.

findinglisp
Posts: 447
Joined: Sat Jun 28, 2008 7:49 am
Location: Austin, TX
Contact:

Re: General question about Garbage Collection

Post by findinglisp » Thu Feb 05, 2009 10:21 am

Exolon wrote: Oh, and on the subject of "reserving a lot more space than actually is needed", this problem would be more complicated in a global GC, with an unpredictable number of processes running, each with unknowable memory requirements. The cost of guessing wrong could impact all processes. So it's no less an issue in this case.
I think this was the OP's main point. The issue with running lots of GC-using processes is that GC tends to allocate and hold onto a larger amount of memory than a non-GC-using process doing the equivalent task. The main reason for this is that collections take CPU time, so for efficiency you want to spread them out in time. The more the process can allocate between collections, the less overhead each collection consumes, on average. Further, if you spread them out, then there is a better chance for objects to "die young" and not live through the collection, which lowers the time spent on the collection for many algorithms. Therefore, the GC is incentivized to hang on to more OS-level memory pages that it might if it was a C program with manual memory management, for instance.

Running multiple programs in a shared heap would generally make this less of an issue by amortizing the additional memory overhead across multiple processes. The global collector would also generally collect more often than would a single process collector because the GC could be triggered by any of a number of processes that might be allocating at different rates. Finally, as you said the cost of each of those collections might also go up since the cost of GC is often proportional to the amount of live data, depending on the algorithms used.

If people are really interested in this, I think some of there was some work that went on by one of the Gambit Scheme authors about the difference in Erlang between using many small, process-local heaps versus a single larger heap for all processes. Their conclusion was that it was better to use a single larger heap for all processes. The reference is: http://citeseerx.ist.psu.edu/viewdoc/su ... .1.20.9664

That said, using a global GC heap at an OS level suffers from all the practical problems that I already outlined previously. The main issue is that you'd have to get buy-in from everybody to actually use it, otherwise it's no better than what you have today.
Cheers, Dave
Slowly but surely the world is finding Lisp. http://www.findinglisp.com/blog/

Wodin
Posts: 56
Joined: Sun Jun 29, 2008 8:16 am

Re: General question about Garbage Collection

Post by Wodin » Thu Feb 05, 2009 10:52 am

One approximation of this would be to run a JVM and then use Java, Jython, JRuby, Scala, Clojure etc. on top of it :)

qbg
Posts: 64
Joined: Mon Jun 30, 2008 1:05 pm
Location: Minnesota

Re: General question about Garbage Collection

Post by qbg » Fri Feb 06, 2009 7:40 pm

Harleqin wrote: I am always dreaming of the "return of the lisp machines". It should be possible to build a Lisp-based system on current hardware, perhaps using Movitz. Lisp would be the system language, just like C is in current Unices. In such an environment, garbage collection would be a natural component of the kernel.
Or you could take the borg approach. Take your GNU/Linux distribution of choice and create/port as many lisp programs as you want/can do to ECL. If you feel it is too much effort to port software X at this point to Lisp, you can include it anyways, leaving it to be assimilated later.

Obvious advantage of this approach is that you would gain access to a ton of work that has already been done for you, and by using ECL you get to infect your C machine with Lisp...

Post Reply