Jeremy Linton wrote:
Eric P. wrote:
Dave Hansen wrote:
On Tue, 29 Nov 2005 10:40:24 -0500 in comp.arch, "Eric P."
eric_pattison@sympaticoREMOVE.ca> wrote:
I conclude therefore that the correct solution is to specify the
reserved linear stack size for each individual thread at create
and that means it is as an argument to the ThreadCreate function.
Note that this is NOT how Win32 works. As an optimization one could
What, then, is the purpose of the dwStackSize parameter to the Win32
function CreateThread?
It sets the Commit size. I have no idea why MS thinks I would want
to since the commit pages are automatically expanded.
For performance of course... That way you don't have to worry about
taking individual page faults for each 4k page in the stack.
Which wouldn't be an issue if they didn't expand the commit
space one page at a time in the first place.
But your
partially wrong about the commit vs reserve issue (which should be
thought of as the MAX stack size ever). The SDK documentation says:
"To change the initially committed stack space, use the dwStackSize
parameter of the CreateThread, CreateRemoteThread, or CreateFiber
function. This value is rounded up to the nearest page. Generally, the
reserve size is the default reserve size specified in the executable
header. However, if the initially committed size specified by
dwStackSize is larger than the default reserve size, the reserve size is
this new commit size rounded up to the nearest multiple of 1 MB. "
You are correct. My apologies.
The CreateThread docs have been updated and missed that.
That info was previously buried at the end of a knowledge base article.
I was going on what CreateThread used to say:
"dwStackSize: Specifies the size, in bytes, of the stack for the new
thread. If 0 is specified, the stack size defaults to the same size
as that of the primary thread of the process. <...> CreateThread tries
to commit the number of bytes specified by dwStackSize, and fails if
the size exceeds available memory."
Even so, I believe my concerns below still apply.
So basically the linker option sets the default, which is always used
unless someone specifies a stack size larger, in which case it is grown.
Now 1Meg seems a tiny amount of stack today, but its been that way for a
long time, and I'm sure that back in '93 or so when everyone had 32-bit
machines with 8 megs or so it seemed like a lot of space for a stack.
Its easy to change, just flip the linker switch and give yourself a few
gigs with windows-64, but it also directly influences the number of
threads that can be created. So, it is sort of a trade off, how big is
your max stack vs how many threads you can create. You can't grow past
the reserve amount because there is a good chance some other thread has
placed its stack right before yours. On 32-bit linux, it seems the
decision (not true anymore) had been made to limit the system to 256
threads so the stack size could be bigger. The application I currently
work on needs a lot of threads (large SMP hardware, with lots of blocked
threads just sitting around) so it would have been nice to have more,
since we consume very little stack space. So, the problem can go both
ways. At least in windows, its easy to mix and match the stack sizes
based on the thread requirements rather than having them fixed (unless
there is a way to get linux to dynamically set the thread size, that I
don't know about). So, this is an instant win for windows in my book....
My concern is what happens when you make the default smaller
so you can run more threads. Any CreateThread calls in library
functions that specified a commit size of 0 will also change.
I have had situations where I know that my threads do not require
1 MB. But I cannot lower the stack size for just those threads,
and if I change the global default it may affect all threads,
even ones my code did not create.
Based on this experience I conclude that it would have been better
design for CreateThread to require the caller to specify the both
commit & reserve as arguments for each thread.
The global process defaults look easy, but seem just plain dangerous
to me because they allow stacks to be adjusted without regard to usage.
Windoze commits only a page at a time (by default) to a thread's stack
as each page boundary is exceeded. How is this different from what
you describe?
I'm not saying you are wrong. I'm trying to understand how your
solution differs from Win32.
Currently, for every allocation > 4 KB or by alloca, it touches every
page one by one. That not only wastes time in a loop touching pages
over and over, as far as I can tell it only extends the stack one
page at a time. Dumb. Dumb. Dumb.
Hu? That is what the commit flags to ZwAllocateVirtualMemory() (exposed
through VirtualAllocEx() does for you... The NT kernel doesn't really
seem to know anything about user space stacks. That is all done through
the win32 subsystem. So in that regard, its sort of silly to complain
about the OS just demand paging the one page you ask for.
I know that, but I am not referring to the commit space.
I am referring to how the committed stack grows into the reserve space.
It was not actually clear, to me anyway, which code is doing this.
The documentation does not say, but tends to imply that it should
be the Win32 user mode code. However I have looked for exceptions
generated into user mode that would trigger the stack expansion and
find none. That is why I put the 'kernel(?)' in my previous message.
(It was long winded enough as it was.)
Anyway, I think you should try single stepping through he assembler
for 'alloca' or a _chkstk call. Alloca calls __alloca_probe which is
almost identical to the code of __chkstk that is also called whenever
you declare a local object larger than 4 KB on the stack.
Both just loop touching pages one by one. Every time it is invoked.
If a routine ever declared a 20 MB or 50 MB array...
void Foo (void)
{ int vec[20*(1<<20)];
}
every call to Foo () does the same check.
To do this optimally is simple.
It requires 3 TEB values: Stack Top, Bottom and Low Water Mark.
Top and Bottom cover the reserved range.
Mark is the low water commit point, between top and bottom.
How well this works is probably application dependent. You could
probably implement your own version using SetThreadStackGuarantee(). The
other option is to call ZwWriteWatch() on your stack region and check to
see which pages have been accessed, and allocate/commit extra space as
you see necessary.
That is great, but SetThreadStackGuarantee is only in Win64.
I also notice an update to CreateThread for XP that allows
you to explicitly specify the reserve size in CreateThread with
the STACK_SIZE_PARAM_IS_A_RESERVATION, somewhat similar to my rants.
These don't help all the billions of lines of existing code.
Eric