libc/winsup/cygwin/how-cygheap-works.txt

Contributed by Christopher Faylor

Cygwin has recently adopted something called the "cygwin heap".  This is
an internal heap that is inherited by forked/execed children.  It
consists of process specific information that should be inherited.  So
things like the file descriptor table, the current working directory,
and the chroot value live there.

The cygheap is also used to pass argv information to a child process.
There is a problem here, though.  If you allocate space for argv on the
heap and then exec a process the child process (1) will happily use the
space in the heap.  But what happens when that process execs another
process (2)?  The space used by child process (1) still is being used in
child process (2) but it is basically just a memory leak.

To rectify this problem, memory used by child process 1 is tagged in
such a way that child process 2 will know to delete it.  This is in
cygheap_fixup_in_child.

The cygheap memory allocation functions are adapted from memory
allocators developed by DJ Delorie.  They are similar to early BSD
malloc and are intended to be relatively lightweight and relatively
fast.

How is the cygheap propagated to the child?

Well, it depends if you are running on Windows 9x or Windows NT.

On NT and 9x, just before CreateProcess is about to be called in
fork or exec, a shared memory region is prepared for copying of the
cygwin heap.  This is in cygheap_setup_for_child.  The handle to this
shared memory region is passed to the new process in the 'child_info'
structure.

If there are no handles that need "fixing up" prior to starting another
process, cygheap_setup_for_child will also copy the contents of the
cygwin heap to the shared memory region.

If there are any handles that need "fixing up" prior to invoking
another process (i.e., sockets) then the creation of the shared
memory region and copying of the current cygwin heap is a two
step process.

First the shared memory region is created and the process is started
in a "CREATE_SUSPENDED" state, inheriting the handle.  After the
process is created, the fixup_before_*() functions are called.  These
set information in the heap and duplicate handles in the child, essentially
ensuring that the child's fd table is correct.

(Note that it is vital that the cygwin heap should not grow during this
process.  Currently, there is no guard against this happening so this
operation is not thread safe.)

Meanwhile, back in fork_parent, the function
cygheap_setup_for_child_cleanup is called.  In the simple "one step"
case above, all that happens is that the shared memory is ummapped and
the handle referring to it is closed.

In the two step process, the cygheap is now copied to the shared memory
region, complete with new fdtab info (the child process will see the
updated information as soon as it starts).  Then the memory is unmapped,
the handle is closed, and upon return the child process is started.

It is in the child process that the difference between Windows 9x and
Windows NT becomes evident.

Under Windows NT, the operation is simple.  The shared memory handle is
used to map the information that the parent has set up into the cygheap
location in the child.  This means that the child has a copy of the
cygwin heap existing in "shared memory" but the only process with a view
to this "shared memory" is the child.

Under Windows 9x, due to address limitations, we can't just map the
shared memory region into the cygheap position.  So, instead, the memory
is mapped whereever Windows wants to put it, a new heap region is
allocated at the same place as in the parent, the contents of the shared
memory is *copied* to the new heap, and the shared memory is unmapped.
Simple, huh?

Why do we go to these contortions?  Previous versions (<1.3.3) of cygwin
used to block when creating a child so that the child could copy the
parent's cygheap.  The problem with this was that when a cygwin process
invoked a non-cygwin child, it would block forever waiting for the child
to let it know that it was done copying the heap.  That caused
understandable complaints from people who wanted to run non-cygwin
applications "in the background".

In Cygwin 1.3.3 (and presumably beyond) the location of the cygwin heap
has been fixed to be at the end of the cygwin1.dll address space.
Previously, we let the "OS" choose where to allocate the cygwin heap in
the initial cygwin process and attempted to use this same location in
subsequent cygwin processes started from this parent.

The reason for putting cygheap at a fixed, known location is that we
need to put this information at a fixed location since it incorporates
pointers to entities within itself.  So, when a process forks or execs,
the memory referred to by the pointers has to exist at the same place in
both the parent or the child.

(It "might be nice" to used something like Microsoft's "based pointers"
for the cygheap.  Unfortunately gcc does not support that feature, as of
this writing.)

The reason for choosing a fixed, arbitrary location is to accommodate
Windows XP, although there were sporadic complaints of cygwin heap
failures in other pathological situations with both NT and 9x.  In
Windows XP, Microsoft made the allocation of memory less deterministic.
This is certainly their right.  Cygwin was previously relying on
undocumented and "iffy" behavior before.  So, now we always allocate
space immediately after the dll in the theory that there is not going
to be anything else living there.

Recent (2001-09-20) cygwin email threads have indicated that we're not
exactly on completely firm ground now, though.  We are assuming that
there is sufficient space after the cygwin DLL for the allocation of the
cygwin heap.  Unfortunately the ld option '--enable-auto-image-base'
has a tendency to allocate DLLs immediately after cygwin1.dll.  This
causes the dreaded "Couldn't reserve space for cygwin's heap" message.

Solutions for this behavior are currently in the musing state.