Thread Stacks
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Thread Stacks
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Eric P.
Guest





Posted: Wed Nov 30, 2005 1:15 am    Post subject: Re: Thread Stacks Reply with quote

Dave Hansen wrote:
Quote:

On Tue, 29 Nov 2005 10:40:24 -0500 in comp.arch, "Eric P."
eric_pattison@sympaticoREMOVE.ca> wrote:

[...]
I conclude therefore that the correct solution is to specify the
reserved linear stack size for each individual thread at create
and that means it is as an argument to the ThreadCreate function.
Note that this is NOT how Win32 works. As an optimization one could

What, then, is the purpose of the dwStackSize parameter to the Win32
function CreateThread?

It sets the Commit size. I have no idea why MS thinks I would want
to since the commit pages are automatically expanded.

Quote:
maintain a pool of previously used stack ranges to save continuously
reserving and releasing the same sized memory sections.

As the stack grows down, each thread should track its low water mark,
and bounds check and commit new space efficiently, as a single system
request, only when below the mark. This is also not how Win32 works.

Windoze commits only a page at a time (by default) to a thread's stack
as each page boundary is exceeded. How is this different from what
you describe?

I'm not saying you are wrong. I'm trying to understand how your
solution differs from Win32.

Currently, for every allocation > 4 KB or by alloca, it touches every
page one by one. That not only wastes time in a loop touching pages
over and over, as far as I can tell it only extends the stack one
page at a time. Dumb. Dumb. Dumb.

To do this optimally is simple.
It requires 3 TEB values: Stack Top, Bottom and Low Water Mark.
Top and Bottom cover the reserved range.
Mark is the low water commit point, between top and bottom.

When a large block is required, check against the Mark.
If NewSP >= Mark then it has already been probed so nothing to do.
If NewSp >= Bottom, touch ONE reserved page and WNT should extend
the committed stack up to that point, and update low water mark.
If NewSP < Bottom, throw exception.

The vast majority of checks will be picked off by the low water test.
To extend the commit stack an arbitrary amount requires one page fault.

Eric
Back to top
Chris Gray
Guest





Posted: Wed Nov 30, 2005 8:36 am    Post subject: Re: Thread Stacks Reply with quote

Bernd Paysan <bernd.paysan@gmx.de> writes:

Quote:
Let's take Gforth as example: GCC allocates 2904 bytes for the engine()
function on x86_64. Yes, if I count it through, I get that many local
variables to assign. However: Only half a dozend variables *live* for the
entire life of this functions, all the others are temporaries, most of them
small! If GCC would apply a stack-like allocation rule for all variables
with limited scope, it would get away with 10 or 20 times less space. And
probably the register allocation algorithm would have a much easier time to
figure out which registers to choose. Now, we have to assign the most
important by hand.

This is an extreme example, but C functions often have locally scoped
temporary variables, just less. All this adds up to more stack use than
necessary, to waste of memory.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Put the variables in inner scopes, and gcc will re-use registers at least.
I don't have that many in my byte-code machine so I haven't noticed if it
re-uses stack space or not. It definitely re-uses processor registers. E.g.

CASE(bc_usub) {
z_word_t ul1, ul2;

ul2 = *sp++;
ul1 = *sp;
if (ul2 > ul1) {
ex->ex_PC = pc;
runError(ex, "run-time uint sub underflow");
return;
}
*sp = ul1 - ul2;
BREAK;
}

(CASE and BREAK are macros that will do switch statement stuff or gcc's
computed goto stuff, depending on compile-time flags.)

--
Experience should guide us, not rule us.

Chris Gray cg@ami-cg.GraySage.COM
http://www.Nalug.ORG/ (Lego)
http://www.GraySage.COM/cg/ (Other)
Back to top
Peter Dickerson
Guest





Posted: Wed Nov 30, 2005 9:15 am    Post subject: Re: Thread Stacks Reply with quote

"George Neuner" <gneuner2/@comcast.net> wrote in message
news:614po1hhnliu3bjvkq3n9eag1p9p18bvdn@4ax.com...
Quote:
On Tue, 29 Nov 2005 10:33:11 +0000, Ian Rogers
ian.rogers@manchester.ac.uk> wrote:

George Neuner wrote:
On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.

The logical space is only 4GB. The base address of the selector gets
added to the address and truncated to 4GB to form the logical address
(which is a shame as it'd be nice sometimes to address more than 4GB of
RAM in a single process). The logical address then gets mapped through
the page tables onto a 36bit physical memory address. Hence 64TB of
physical memory addressable but only 4GB addressable in a single process.

Ian Rogers

No. The *physical* space is 4GB - the P6 and later added support for
banked access to up to 64GB.

The *logical* address space is ~16000 4GB segments [I don't remember
the exact number - some of the selector values are illegal].

OK, I wondered where you got 46 bits from. If you are allowing traps often
to change the physical memory map every time a selector is loaded. But in
that case why does the selector have to be ring 3 (say). Such an OS could
use the ring number as part of the logical address - of course the actual
selector value loaded would be different, so you can't save it and reload it
safely... In this case loading a selector into a segment register can be
viewed as an OS call to change the memory bank switching logic.

Peter
Back to top
Bernd Paysan
Guest





Posted: Wed Nov 30, 2005 4:35 pm    Post subject: Re: Thread Stacks Reply with quote

Chris Gray wrote:
Quote:
Put the variables in inner scopes, and gcc will re-use registers at least.

We do that, and GCC does re-use registers. It doesn't reuse the spilled
stuff.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Back to top
Ian Rogers
Guest





Posted: Wed Nov 30, 2005 4:57 pm    Post subject: Re: Thread Stacks Reply with quote

George Neuner wrote:
Quote:
You're confusing the *logical* address space - 16000 4GB segments -
with the *physical* address space.

Your right I was confusing things (GB not TB), but I think my point has
been lost. A logical address is a segment selector plus a 32bit offset.
The segment selector chooses a segment descriptor which has a base
address which gets added to the offset. This combination forms the
linear address which can only be 32bits long. The linear address gets
mapped through the page tables to a potentially 36bit physical address.
What's not clear when you say 16000 4GB segments is that these must all
be within the same 4GB! You can't say have DS set up with a base address
of 0 and FS set up with a base address of 4GB, as the linear address
gets truncated to 32bits.
There are occassions where you would like to do this trick and its
annoying you can't, e.g.:
A debugger or emulator could be in the same address space as an
application but not visible within its 4GB window.
A user space kernel (such as user mode linux) could have a different
application mapped to each of the 16000 segments, and then just copy
data out of them using a segment over-ride.

Regards,

Ian Rogers
Back to top
Bernd Paysan
Guest





Posted: Wed Nov 30, 2005 5:15 pm    Post subject: Re: Thread Stacks Reply with quote

Andi Kleen wrote:

Quote:
Bernd Paysan <bernd.paysan@gmx.de> writes:

Chris Gray wrote:
Put the variables in inner scopes, and gcc will re-use registers at
least.

We do that, and GCC does re-use registers. It doesn't reuse the spilled
stuff.

4.0 got some changes in that area. Did you try it? It should work now.

Yes, with GCC 4.0.2 (as in SuSE 10.0), stack allocation is reduced to ~900
bytes (one third).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Back to top
Andi Kleen
Guest





Posted: Wed Nov 30, 2005 5:15 pm    Post subject: Re: Thread Stacks Reply with quote

Bernd Paysan <bernd.paysan@gmx.de> writes:

Quote:
Chris Gray wrote:
Put the variables in inner scopes, and gcc will re-use registers at least.

We do that, and GCC does re-use registers. It doesn't reuse the spilled
stuff.

4.0 got some changes in that area. Did you try it? It should work now.

-Andi
Back to top
Eric P.
Guest





Posted: Wed Nov 30, 2005 11:22 pm    Post subject: Re: Thread Stacks Reply with quote

Ken Hagan wrote:
Quote:

Eric P. wrote:

I disagree.
The potential amount of space a thread might use is a property
only of its specified starting routine.

...and its input, at least for some obvious implementations of (say)
a recursive descent parser.

Sure, but that still needs an upper nesting limit or it would allow
stack overflows, based on sufficiently complex external input.

For example, an RDB server with a thread managing each connection,
and a recursive decent SQL parser in each. You can't allow a
complex SQL statement to cause a stack overflow, so it must check
its stack depth and abort.

(I would not design such a database server that way though.
I would use async IO for disk and network, work queues of callbacks
using state machine logic, and a small pool of worker threads.
So the 1000 thread stacks problem does not occur.)

Eric
Back to top
Jeremy Linton
Guest





Posted: Thu Dec 01, 2005 1:15 am    Post subject: Re: Thread Stacks Reply with quote

Eric P. wrote:
Quote:
My concern is what happens when you make the default smaller
so you can run more threads. Any CreateThread calls in library
functions that specified a commit size of 0 will also change.
Well its probably a valid concern, and one of the reasons why the

default reserve stack size is as small as it is. Its always going to be
easier to increase the reserve (max) stack size rather than decrease it.
Budding OS writers should take note of this!

Quote:

I have had situations where I know that my threads do not require
1 MB. But I cannot lower the stack size for just those threads,
and if I change the global default it may affect all threads,
even ones my code did not create.
This is basically the same point, my understanding is that windows gets

really close to 2k threads per process with the standard 2/2 memory
break on the 32-bit kernel. That is a lot of threads.... Might be time
to consider a different paradigm for your application, or upgrading to
64-bit.


Quote:

Based on this experience I conclude that it would have been better
design for CreateThread to require the caller to specify the both
commit & reserve as arguments for each thread.
Chuckle, look at the native api reference, this is exactly what

ZwCreateThread does. Of course this doesn't help you if all your
libraries are already win32. On the other hand people were complaining
that CreateThread already takes to many parameters.
Quote:


I know that, but I am not referring to the commit space.
I am referring to how the committed stack grows into the reserve space.
It was not actually clear, to me anyway, which code is doing this.
Probably isn't any code as such. It simply the way the OS handles all

virtually allocated but unused space. Its reserved in the virtual memory
map and against the total ram+page file size, and it is committed to
physical ram when its touched. Same as any other demand loaded page.
Except in this case its demand created. Only it doesn't tend to be as
fast on page faults because the caching and page fault algorithms are
probably all forward biased for mmaped sections. If you touch
curstackoffset-somebigsize you probably can speed up the demand commit
operations. Of course you could just as well tell the OS to commit the
virtual range, might be faster than the page fault, might not.

Quote:
generated into user mode that would trigger the stack expansion and
find none. That is why I put the 'kernel(?)' in my previous message.
(It was long winded enough as it was.)
Thet is because its not really "stack expansion" its simply a VM commit.


Quote:

Anyway, I think you should try single stepping through he assembler
for 'alloca' or a _chkstk call. Alloca calls __alloca_probe which is
almost identical to the code of __chkstk that is also called whenever
you declare a local object larger than 4 KB on the stack.

Both just loop touching pages one by one. Every time it is invoked.
If a routine ever declared a 20 MB or 50 MB array...

void Foo (void)
{ int vec[20*(1<<20)];
}

every call to Foo () does the same check.
This is not a Win32 or OS issue. That is a Visual Studio/CRT issue. The

check stack seems to be there primary to force the stack overflow
exception during the allocation rather than during the reference. Its
easy to disable for particular functions with the
#pragma check_stack(off)
directive. You can also control how big of a stack allocation causes
this to fire with the /Gs command line switch to the compiler.
I also think calling alloca is not recommended practice anymore, check
the documentation for more information.
If you know enough about your stack allocations its unlikely you really
need this code anyway. You could also replace it with something a little
smarter. It would be easy to hide the stack base/len somewhere and
simply check the current stack pointer and the requested size against
those values. I'm sort of surprised visual studio isn't doing something
similar.



Quote:
To do this optimally is simple.
It requires 3 TEB values: Stack Top, Bottom and Low Water Mark.
Top and Bottom cover the reserved range.
Mark is the low water commit point, between top and bottom.

How well this works is probably application dependent. You could
probably implement your own version using SetThreadStackGuarantee(). The
other option is to call ZwWriteWatch() on your stack region and check to
see which pages have been accessed, and allocate/commit extra space as
you see necessary.


That is great, but SetThreadStackGuarantee is only in Win64.
I also notice an update to CreateThread for XP that allows
you to explicitly specify the reserve size in CreateThread with
the STACK_SIZE_PARAM_IS_A_RESERVATION, somewhat similar to my rants.

These don't help all the billions of lines of existing code.
Well they seem to be working just fine right now, so they probably

don't need it! Half of them were probably written back when everyone had
16M of RAM and 1M stack spaces seem like a lot.
Back to top
Eric P.
Guest





Posted: Thu Dec 01, 2005 1:15 am    Post subject: Re: Thread Stacks Reply with quote

Jeremy Linton wrote:
Quote:

Eric P. wrote:

Dave Hansen wrote:

On Tue, 29 Nov 2005 10:40:24 -0500 in comp.arch, "Eric P."
eric_pattison@sympaticoREMOVE.ca> wrote:
I conclude therefore that the correct solution is to specify the
reserved linear stack size for each individual thread at create
and that means it is as an argument to the ThreadCreate function.
Note that this is NOT how Win32 works. As an optimization one could

What, then, is the purpose of the dwStackSize parameter to the Win32
function CreateThread?


It sets the Commit size. I have no idea why MS thinks I would want
to since the commit pages are automatically expanded.

For performance of course... That way you don't have to worry about
taking individual page faults for each 4k page in the stack.

Which wouldn't be an issue if they didn't expand the commit
space one page at a time in the first place.

Quote:
But your
partially wrong about the commit vs reserve issue (which should be
thought of as the MAX stack size ever). The SDK documentation says:

"To change the initially committed stack space, use the dwStackSize
parameter of the CreateThread, CreateRemoteThread, or CreateFiber
function. This value is rounded up to the nearest page. Generally, the
reserve size is the default reserve size specified in the executable
header. However, if the initially committed size specified by
dwStackSize is larger than the default reserve size, the reserve size is
this new commit size rounded up to the nearest multiple of 1 MB. "

You are correct. My apologies.
The CreateThread docs have been updated and missed that.
That info was previously buried at the end of a knowledge base article.

I was going on what CreateThread used to say:

"dwStackSize: Specifies the size, in bytes, of the stack for the new
thread. If 0 is specified, the stack size defaults to the same size
as that of the primary thread of the process. <...> CreateThread tries
to commit the number of bytes specified by dwStackSize, and fails if
the size exceeds available memory."

Even so, I believe my concerns below still apply.

Quote:
So basically the linker option sets the default, which is always used
unless someone specifies a stack size larger, in which case it is grown.

Now 1Meg seems a tiny amount of stack today, but its been that way for a
long time, and I'm sure that back in '93 or so when everyone had 32-bit
machines with 8 megs or so it seemed like a lot of space for a stack.
Its easy to change, just flip the linker switch and give yourself a few
gigs with windows-64, but it also directly influences the number of
threads that can be created. So, it is sort of a trade off, how big is
your max stack vs how many threads you can create. You can't grow past
the reserve amount because there is a good chance some other thread has
placed its stack right before yours. On 32-bit linux, it seems the
decision (not true anymore) had been made to limit the system to 256
threads so the stack size could be bigger. The application I currently
work on needs a lot of threads (large SMP hardware, with lots of blocked
threads just sitting around) so it would have been nice to have more,
since we consume very little stack space. So, the problem can go both
ways. At least in windows, its easy to mix and match the stack sizes
based on the thread requirements rather than having them fixed (unless
there is a way to get linux to dynamically set the thread size, that I
don't know about). So, this is an instant win for windows in my book....

My concern is what happens when you make the default smaller
so you can run more threads. Any CreateThread calls in library
functions that specified a commit size of 0 will also change.

I have had situations where I know that my threads do not require
1 MB. But I cannot lower the stack size for just those threads,
and if I change the global default it may affect all threads,
even ones my code did not create.

Based on this experience I conclude that it would have been better
design for CreateThread to require the caller to specify the both
commit & reserve as arguments for each thread.

The global process defaults look easy, but seem just plain dangerous
to me because they allow stacks to be adjusted without regard to usage.

Quote:
Windoze commits only a page at a time (by default) to a thread's stack
as each page boundary is exceeded. How is this different from what
you describe?

I'm not saying you are wrong. I'm trying to understand how your
solution differs from Win32.


Currently, for every allocation > 4 KB or by alloca, it touches every
page one by one. That not only wastes time in a loop touching pages
over and over, as far as I can tell it only extends the stack one
page at a time. Dumb. Dumb. Dumb.
Hu? That is what the commit flags to ZwAllocateVirtualMemory() (exposed
through VirtualAllocEx() does for you... The NT kernel doesn't really
seem to know anything about user space stacks. That is all done through
the win32 subsystem. So in that regard, its sort of silly to complain
about the OS just demand paging the one page you ask for.

I know that, but I am not referring to the commit space.
I am referring to how the committed stack grows into the reserve space.
It was not actually clear, to me anyway, which code is doing this.
The documentation does not say, but tends to imply that it should
be the Win32 user mode code. However I have looked for exceptions
generated into user mode that would trigger the stack expansion and
find none. That is why I put the 'kernel(?)' in my previous message.
(It was long winded enough as it was.)

Anyway, I think you should try single stepping through he assembler
for 'alloca' or a _chkstk call. Alloca calls __alloca_probe which is
almost identical to the code of __chkstk that is also called whenever
you declare a local object larger than 4 KB on the stack.

Both just loop touching pages one by one. Every time it is invoked.
If a routine ever declared a 20 MB or 50 MB array...

void Foo (void)
{ int vec[20*(1<<20)];
}

every call to Foo () does the same check.

Quote:

To do this optimally is simple.
It requires 3 TEB values: Stack Top, Bottom and Low Water Mark.
Top and Bottom cover the reserved range.
Mark is the low water commit point, between top and bottom.
How well this works is probably application dependent. You could
probably implement your own version using SetThreadStackGuarantee(). The
other option is to call ZwWriteWatch() on your stack region and check to
see which pages have been accessed, and allocate/commit extra space as
you see necessary.

That is great, but SetThreadStackGuarantee is only in Win64.
I also notice an update to CreateThread for XP that allows
you to explicitly specify the reserve size in CreateThread with
the STACK_SIZE_PARAM_IS_A_RESERVATION, somewhat similar to my rants.

These don't help all the billions of lines of existing code.

Eric
Back to top
Jeremy Linton
Guest





Posted: Thu Dec 01, 2005 1:15 am    Post subject: Re: Thread Stacks Reply with quote

Eric P. wrote:

Quote:
Dave Hansen wrote:

On Tue, 29 Nov 2005 10:40:24 -0500 in comp.arch, "Eric P."
eric_pattison@sympaticoREMOVE.ca> wrote:
I conclude therefore that the correct solution is to specify the
reserved linear stack size for each individual thread at create
and that means it is as an argument to the ThreadCreate function.
Note that this is NOT how Win32 works. As an optimization one could

What, then, is the purpose of the dwStackSize parameter to the Win32
function CreateThread?


It sets the Commit size. I have no idea why MS thinks I would want
to since the commit pages are automatically expanded.
For performance of course... That way you don't have to worry about

taking individual page faults for each 4k page in the stack. But your
partially wrong about the commit vs reserve issue (which should be
thought of as the MAX stack size ever). The SDK documentation says:

"To change the initially committed stack space, use the dwStackSize
parameter of the CreateThread, CreateRemoteThread, or CreateFiber
function. This value is rounded up to the nearest page. Generally, the
reserve size is the default reserve size specified in the executable
header. However, if the initially committed size specified by
dwStackSize is larger than the default reserve size, the reserve size is
this new commit size rounded up to the nearest multiple of 1 MB. "

So basically the linker option sets the default, which is always used
unless someone specifies a stack size larger, in which case it is grown.

Now 1Meg seems a tiny amount of stack today, but its been that way for a
long time, and I'm sure that back in '93 or so when everyone had 32-bit
machines with 8 megs or so it seemed like a lot of space for a stack.
Its easy to change, just flip the linker switch and give yourself a few
gigs with windows-64, but it also directly influences the number of
threads that can be created. So, it is sort of a trade off, how big is
your max stack vs how many threads you can create. You can't grow past
the reserve amount because there is a good chance some other thread has
placed its stack right before yours. On 32-bit linux, it seems the
decision (not true anymore) had been made to limit the system to 256
threads so the stack size could be bigger. The application I currently
work on needs a lot of threads (large SMP hardware, with lots of blocked
threads just sitting around) so it would have been nice to have more,
since we consume very little stack space. So, the problem can go both
ways. At least in windows, its easy to mix and match the stack sizes
based on the thread requirements rather than having them fixed (unless
there is a way to get linux to dynamically set the thread size, that I
don't know about). So, this is an instant win for windows in my book....





Quote:
Windoze commits only a page at a time (by default) to a thread's stack
as each page boundary is exceeded. How is this different from what
you describe?

I'm not saying you are wrong. I'm trying to understand how your
solution differs from Win32.


Currently, for every allocation > 4 KB or by alloca, it touches every
page one by one. That not only wastes time in a loop touching pages
over and over, as far as I can tell it only extends the stack one
page at a time. Dumb. Dumb. Dumb.
Hu? That is what the commit flags to ZwAllocateVirtualMemory() (exposed

through VirtualAllocEx() does for you... The NT kernel doesn't really
seem to know anything about user space stacks. That is all done through
the win32 subsystem. So in that regard, its sort of silly to complain
about the OS just demand paging the one page you ask for.
Quote:

To do this optimally is simple.
It requires 3 TEB values: Stack Top, Bottom and Low Water Mark.
Top and Bottom cover the reserved range.
Mark is the low water commit point, between top and bottom.
How well this works is probably application dependent. You could

probably implement your own version using SetThreadStackGuarantee(). The
other option is to call ZwWriteWatch() on your stack region and check to
see which pages have been accessed, and allocate/commit extra space as
you see necessary.
Back to top
Eric P.
Guest





Posted: Thu Dec 01, 2005 9:15 am    Post subject: Re: Thread Stacks Reply with quote

Jeremy Linton wrote:
Quote:

Eric P. wrote:
My concern is what happens when you make the default smaller
so you can run more threads. Any CreateThread calls in library
functions that specified a commit size of 0 will also change.
Well its probably a valid concern, and one of the reasons why the
default reserve stack size is as small as it is. Its always going to be
easier to increase the reserve (max) stack size rather than decrease it.
Budding OS writers should take note of this!

That is why I brought it up - so others may avoid this cow pie.

Quote:
Based on this experience I conclude that it would have been better
design for CreateThread to require the caller to specify the both
commit & reserve as arguments for each thread.
Chuckle, look at the native api reference, this is exactly what
ZwCreateThread does. Of course this doesn't help you if all your
libraries are already win32. On the other hand people were complaining
that CreateThread already takes to many parameters.

Yes, I see. This does not surprise me. Often I think that the NT Kernel
did the right thing, but that the Win32 wrapper screwed it up.
(My other big beefs are full reentrancy, thread rundown, function
return status codes, and arg data types, also screwed up in Win32)

Quote:
I know that, but I am not referring to the commit space.
I am referring to how the committed stack grows into the reserve space.
It was not actually clear, to me anyway, which code is doing this.
Probably isn't any code as such. It simply the way the OS handles all
virtually allocated but unused space. Its reserved in the virtual memory
map and against the total ram+page file size, and it is committed to
physical ram when its touched. Same as any other demand loaded page.

Yes, but read on.

The reserve stack space is recorded a b-tree of reserved address
ranges in entries called Virtual Address Descriptors, and does
not require any changes the page table to reserve space.

Committing stack requires changing the PTE's to be demand zero pages,
but does not actually allocate physical memory. When the pages are
next touched a zero'ed page frame is assigned.

Touching a reserved but not committed page causes an access violation.
According to the documentation and experiment, changing reserved into
committed pages **requires** the _chkstk probes. The OS, Win32 and
the C-RTL work together to make it function correctly.

See below for more on _chkstk.

Quote:
Except in this case its demand created. Only it doesn't tend to be as
fast on page faults because the caching and page fault algorithms are
probably all forward biased for mmaped sections. If you touch
curstackoffset-somebigsize you probably can speed up the demand commit
operations. Of course you could just as well tell the OS to commit the
virtual range, might be faster than the page fault, might not.

One call to VirtualAlloc would be faster than hundreds of page faults.

Quote:
generated into user mode that would trigger the stack expansion and
find none. That is why I put the 'kernel(?)' in my previous message.
(It was long winded enough as it was.)
Thet is because its not really "stack expansion" its simply a VM commit.


Anyway, I think you should try single stepping through he assembler
for 'alloca' or a _chkstk call. Alloca calls __alloca_probe which is
almost identical to the code of __chkstk that is also called whenever
you declare a local object larger than 4 KB on the stack.

Both just loop touching pages one by one. Every time it is invoked.
If a routine ever declared a 20 MB or 50 MB array...

void Foo (void)
{ int vec[20*(1<<20)];
}

every call to Foo () does the same check.
This is not a Win32 or OS issue. That is a Visual Studio/CRT issue. The
check stack seems to be there primary to force the stack overflow
exception during the allocation rather than during the reference. Its
easy to disable for particular functions with the
#pragma check_stack(off)
directive. You can also control how big of a stack allocation causes
this to fire with the /Gs command line switch to the compiler.
I also think calling alloca is not recommended practice anymore, check
the documentation for more information.

No.
#pragma check_stack(off) and /Gs only work in DOS & Win3.1
In Win32 these switches are [mostly] ignored and you always get
stack checks when the local vars > 4 KB.

See Q100775 Stack Checking for Windows NT-based Applications.

It basically says that _chkstk is *required* for NT to work correctly
and if you just subtract from ESP and touch memory, your code breaks.

For proof, try the following. It will get Access Violation:

void foo (void)
{ // Move down 5 pages in stack and write
_asm { sub esp, 0x5000 };
_asm { push eax }
}

int main (int argc, const char *argv[])
{ foo ();
return 0;
}

It says that the /Gs threshold value can be set to anything, but
"For a user mode application to run correctly in Windows NT, the
default threshold 4096 is required."

Quote:
If you know enough about your stack allocations its unlikely you really
need this code anyway. You could also replace it with something a little
smarter. It would be easy to hide the stack base/len somewhere and
simply check the current stack pointer and the requested size against
those values. I'm sort of surprised visual studio isn't doing something
similar.

Yes, that is what I said. Well, you were more polite.
I called it Dumb Dumb Dumb.

Quote:
These don't help all the billions of lines of existing code.
Well they seem to be working just fine right now, so they probably
don't need it! Half of them were probably written back when everyone had
16M of RAM and 1M stack spaces seem like a lot.

The whole thing gives me an ugly feeling that it works by luck
rather than design.

Eric
Back to top
George Neuner
Guest





Posted: Thu Dec 01, 2005 9:15 am    Post subject: Re: Thread Stacks Reply with quote

On Wed, 30 Nov 2005 07:34:11 GMT, "Peter Dickerson"
<first{dot}surname@ukonline.co.uk> wrote:

Quote:
"George Neuner" <gneuner2/@comcast.net> wrote in message
news:614po1hhnliu3bjvkq3n9eag1p9p18bvdn@4ax.com...

The *physical* space is 4GB - the P6 and later added support for
banked access to up to 64GB.

The *logical* address space is ~16000 4GB segments [I don't remember
the exact number - some of the selector values are illegal].

OK, I wondered where you got 46 bits from. If you are allowing traps often
to change the physical memory map every time a selector is loaded. But in
that case why does the selector have to be ring 3 (say). Such an OS could
use the ring number as part of the logical address - of course the actual
selector value loaded would be different, so you can't save it and reload it
safely... In this case loading a selector into a segment register can be
viewed as an OS call to change the memory bank switching logic.

On 32-bit Intel designs, loading a segment register from any ring
other than 0 causes a trap to ring 0 so the selector can be validated.
This segment load trap was very expensive on the 286 ... it was much
faster on the 386, but the 386 introduced paging as well so virtually
all the OS developers chose to put all of memory into a single segment
and using paging instead.

IIRC, the segment TLB was removed or greatly reduced in size on the
Pentiums [because no one used it] making protected mode segment
register loads even more costly because validation results were not
being cached. IA64 has dropped support for hardware segmentation
entirely.

Tandem had a secure 386 OS that used segmentation but I'm not aware of
any others. All the popular systems ignored them entirely. It's sad
because the MMU did segment checking anyway prior to page checking.
Things like "data execution prevention" have been available since 1985
as segment privileges - it's just that no one cared.

Given the cycle cost of validation [even on the earlier CPUs that
expected to do it], I think it was a sensible choice not to abuse
segmentation. But IMO the developers went overboard the other way. A
reasonable compromise, I think, would have been to restrict each
application to just a few segments - code/data or even code/data/stack
.... you have 16000+ values to use - and to partition applications from
the OS and from each other. The whole line of IA32 operating systems
could have been a lot more robust if more sensible choices had been
made early.

George
--
for email reply remove "/" from address
Back to top
George Neuner
Guest





Posted: Thu Dec 01, 2005 9:15 am    Post subject: Re: Thread Stacks Reply with quote

On Wed, 30 Nov 2005 10:57:22 +0000, Ian Rogers
<ian.rogers@manchester.ac.uk> wrote:

Quote:
George Neuner wrote:
You're confusing the *logical* address space - 16000 4GB segments -
with the *physical* address space.

Your right I was confusing things (GB not TB), but I think my point has
been lost. A logical address is a segment selector plus a 32bit offset.
The segment selector chooses a segment descriptor which has a base
address which gets added to the offset. This combination forms the
linear address which can only be 32bits long. The linear address gets
mapped through the page tables to a potentially 36bit physical address.
What's not clear when you say 16000 4GB segments is that these must all
be within the same 4GB! You can't say have DS set up with a base address
of 0 and FS set up with a base address of 4GB, as the linear address
gets truncated to 32bits.

Right. The page tables have to be swapped manually if you want exceed
the physical limit. It can be done but is a PITA as you can imagine.
Tandem had a secure segmented 386 OS but I'm not sure whether it also
over committed the physical address space.

Quote:
There are occassions where you would like to do this trick and its
annoying you can't, e.g.:
A debugger or emulator could be in the same address space as an
application but not visible within its 4GB window.
A user space kernel (such as user mode linux) could have a different
application mapped to each of the 16000 segments, and then just copy
data out of them using a segment over-ride.

Those are great ideas ... you should have had them 20 years ago when
it mattered. Nobody much used the segment unit so Intel deprecated it
and in later IA32 models removed some of the cache hardware that
supported it. It's very expensive now to change a segment register in
32-bit protected mode. The IA64 MMU has completely done away with
the segment unit.

George
--
for email reply remove "/" from address
Back to top
Ian Rogers
Guest





Posted: Thu Dec 01, 2005 4:50 pm    Post subject: Re: Thread Stacks Reply with quote

George Neuner wrote:
Quote:
Those are great ideas ... you should have had them 20 years ago when
it mattered. Nobody much used the segment unit so Intel deprecated it
and in later IA32 models removed some of the cache hardware that
supported it. It's very expensive now to change a segment register in
32-bit protected mode. The IA64 MMU has completely done away with
the segment unit.

Linux was using fs selector to address thread local storage (TLS) for a
while, but now it does it through the paging system, which has many
advantages. Quite a few systems (well I'm thinking JVMs) need places to
hold onto thread local variables, commiting them to a register wastes
the register so putting them at constant offsets addressed via a
selector is something that has been talked about being done (you could
just hijack an existing TLS except if you need TLS for green threads).

PowerPC has segmentation on the top 4 bits of the address (IIRC). You
can disable this for either instruction or data accesses. I think this
makes it conceivable to have separate 4GB instruction and data regions,
so 8GB total, which is potentially useful in emulation. Its odd that
PowerPC can address more in a 32bit user process than IA32 which seems
to commit more resources to the job. I still think 32bit linear
addresses are a shame :-)

Ian Rogers
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB