| Author |
Message |
Andy Glew
Guest
|
Posted:
Tue Nov 29, 2005 1:15 am Post subject:
Thread Stacks |
|
|
| Quote: | On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
|
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
Contiguous stacks are a pain.
See stackless python etc. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Tue Nov 29, 2005 1:15 am Post subject:
Re: Thread Stacks |
|
|
Andy Glew wrote:
| Quote: | On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
Contiguous stacks are a pain.
See stackless python etc.
|
IBM's standard linkage didn't use a contiguous stack. You could
taka a contiguous stack and dummy it up to look like standard
linkage. I used to have a scheme where I used chunks of contiguous
stack. When one stack segment ran out, I allocated a new stack segment,
switched to that, and inserted a dummy return to deallocate the stack
segment on return. You can pick the granularity of stack segments
to get the trade off you want on overhead vs. memory usage.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Greg Lindahl
Guest
|
Posted:
Tue Nov 29, 2005 1:15 am Post subject:
Re: Thread Stacks |
|
|
In article <peypk6escyko.fsf_-_@pxpl2829.amr.corp.intel.com>,
Andy Glew <andy.glew@intel.com> wrote:
| Quote: | How would you allocate stacks if you wanted, say, 1000 threads.
|
Doctor, it hurts when I do *this*.
Seriously, stick with 64-bit OSes, they cure both this issue and the
issue of mmapped segments with satisfyingly brute force.
-- greg |
|
| Back to top |
|
 |
George Neuner
Guest
|
Posted:
Tue Nov 29, 2005 4:08 pm Post subject:
Re: Thread Stacks |
|
|
On 28 Nov 2005 22:22:15 +0200, Andy Glew <andy.glew@intel.com> wrote:
| Quote: | On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
|
On 64-bit systems you just allocate it.
On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.
George
--
for email reply remove "/" from address |
|
| Back to top |
|
 |
Mikael Pettersson
Guest
|
Posted:
Tue Nov 29, 2005 4:21 pm Post subject:
Re: Thread Stacks |
|
|
In article <peypk6escyko.fsf_-_@pxpl2829.amr.corp.intel.com>,
Andy Glew <andy.glew@intel.com> wrote:
| Quote: | On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
|
1000 threads is peanuts. Erlang does 10000s of threads on 32-bitters.
We start each thread with a small stack and grow/relocate it on demand
via stack overflow checks in function prologues.
Erlang doesn't have intra-stack data pointers, so stack relocation works.
For languages like C/C++ you should probably just use segmented/linked stacks.
--
Mikael Pettersson (mikpe@csd.uu.se)
Computing Science Department, Uppsala University |
|
| Back to top |
|
 |
Ian Rogers
Guest
|
Posted:
Tue Nov 29, 2005 4:33 pm Post subject:
Re: Thread Stacks |
|
|
George Neuner wrote:
| Quote: | On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.
|
The logical space is only 4GB. The base address of the selector gets
added to the address and truncated to 4GB to form the logical address
(which is a shame as it'd be nice sometimes to address more than 4GB of
RAM in a single process). The logical address then gets mapped through
the page tables onto a 36bit physical memory address. Hence 64TB of
physical memory addressable but only 4GB addressable in a single process.
Ian Rogers |
|
| Back to top |
|
 |
Peter Dickerson
Guest
|
Posted:
Tue Nov 29, 2005 5:13 pm Post subject:
Re: Thread Stacks |
|
|
"Ian Rogers" <ian.rogers@manchester.ac.uk> wrote in message
news:dmhapg$6iu$1@wapping.cs.man.ac.uk...
| Quote: | George Neuner wrote:
On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.
The logical space is only 4GB. The base address of the selector gets
added to the address and truncated to 4GB to form the logical address
(which is a shame as it'd be nice sometimes to address more than 4GB of
RAM in a single process). The logical address then gets mapped through
the page tables onto a 36bit physical memory address. Hence 64TB of
physical memory addressable but only 4GB addressable in a single process.
|
Thats 64 GB not TB. 64 TB needs a 46-bit address, which is more like the
physical address range of a 64-bit processor.
Peter |
|
| Back to top |
|
 |
Guest
|
Posted:
Tue Nov 29, 2005 5:15 pm Post subject:
Re: Thread Stacks |
|
|
Bernd Paysan <bernd.paysan@gmx.de> writes:
| Quote: | Andy Glew wrote:
On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
My rant about stack usage in typical C compilers (e.g. GCC): This is a waste
of memory!
Let's take Gforth as example: GCC allocates 2904 bytes for the engine()
function on x86_64. Yes, if I count it through, I get that many local
variables to assign. However: Only half a dozend variables *live* for the
entire life of this functions, all the others are temporaries, most of them
small! If GCC would apply a stack-like allocation rule for all variables
with limited scope, it would get away with 10 or 20 times less space. And
probably the register allocation algorithm would have a much easier time to
figure out which registers to choose. Now, we have to assign the most
important by hand.
|
The strange thing is that it isn't very hard to reuse a graph-colouring
register allocator to colour (allocate) local variables efficiently (been
there, done that, etc)... (though, IIRC, it wouldn't really help register
allocation, as the obvious approach is to apply it afterwards, to spilled
variables)
--
David Gay
dgay@acm.org |
|
| Back to top |
|
 |
Dave Hansen
Guest
|
Posted:
Tue Nov 29, 2005 5:15 pm Post subject:
Re: Thread Stacks |
|
|
On Tue, 29 Nov 2005 10:40:24 -0500 in comp.arch, "Eric P."
<eric_pattison@sympaticoREMOVE.ca> wrote:
[...]
| Quote: | I conclude therefore that the correct solution is to specify the
reserved linear stack size for each individual thread at create
and that means it is as an argument to the ThreadCreate function.
Note that this is NOT how Win32 works. As an optimization one could
|
What, then, is the purpose of the dwStackSize parameter to the Win32
function CreateThread?
| Quote: | maintain a pool of previously used stack ranges to save continuously
reserving and releasing the same sized memory sections.
As the stack grows down, each thread should track its low water mark,
and bounds check and commit new space efficiently, as a single system
request, only when below the mark. This is also not how Win32 works.
|
Windoze commits only a page at a time (by default) to a thread's stack
as each page boundary is exceeded. How is this different from what
you describe?
I'm not saying you are wrong. I'm trying to understand how your
solution differs from Win32.
Regards,
-=Dave
--
Change is inevitable, progress is not. |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Nov 29, 2005 5:15 pm Post subject:
Re: Thread Stacks |
|
|
George Neuner wrote:
| Quote: |
On 28 Nov 2005 22:22:15 +0200, Andy Glew <andy.glew@intel.com> wrote:
On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
On 64-bit systems you just allocate it.
On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.
|
Segmentation just creates a 48 bit address with screwy non-linear
behavior. If you are going to have a 48 bit address then just make
it flat. It costs the same in gates and memory but behaves better.
Also this would make it impossible for one thread to access objects
allocated on another threads stack, thereby greatly complicating
interface considerations. With the proper synchronization,
threads should be able to pass stack data between them.
Eric |
|
| Back to top |
|
 |
Eric P.
Guest
|
Posted:
Tue Nov 29, 2005 5:15 pm Post subject:
Re: Thread Stacks |
|
|
Andy Glew wrote:
| Quote: |
On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
Contiguous stacks are a pain.
|
I disagree.
The potential amount of space a thread might use is a property
only of its specified starting routine.
For threads to operate reliably you need to ensure that their
worst case allocation is available (reserved but not allocated).
Unless you can prove that all your threads cannot use the
worst case at once, you must assume that it might happen.
Otherwise the application will randomly fail.
That means there is no advantage to dynamic stack chaining as
you must ensure that the heap contains sufficient space anyway,
and chaining becomes just a more expensive way to manage stacks.
I conclude therefore that the correct solution is to specify the
reserved linear stack size for each individual thread at create
and that means it is as an argument to the ThreadCreate function.
Note that this is NOT how Win32 works. As an optimization one could
maintain a pool of previously used stack ranges to save continuously
reserving and releasing the same sized memory sections.
As the stack grows down, each thread should track its low water mark,
and bounds check and commit new space efficiently, as a single system
request, only when below the mark. This is also not how Win32 works.
Eric |
|
| Back to top |
|
 |
Bernd Paysan
Guest
|
Posted:
Tue Nov 29, 2005 5:15 pm Post subject:
Re: Thread Stacks |
|
|
Andy Glew wrote:
| Quote: | On platforms where Fortran is commonly used, the stack limit isn't
that low. Low stack limits are e-vile.
-- greg
Hey, let's make a comp.arch topic!
How would you allocate stacks if you wanted, say, 1000 threads.
A megabyte apiece? A gigabyte apiece?
|
My rant about stack usage in typical C compilers (e.g. GCC): This is a waste
of memory!
Let's take Gforth as example: GCC allocates 2904 bytes for the engine()
function on x86_64. Yes, if I count it through, I get that many local
variables to assign. However: Only half a dozend variables *live* for the
entire life of this functions, all the others are temporaries, most of them
small! If GCC would apply a stack-like allocation rule for all variables
with limited scope, it would get away with 10 or 20 times less space. And
probably the register allocation algorithm would have a much easier time to
figure out which registers to choose. Now, we have to assign the most
important by hand.
This is an extreme example, but C functions often have locally scoped
temporary variables, just less. All this adds up to more stack use than
necessary, to waste of memory.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/ |
|
| Back to top |
|
 |
Ken Hagan
Guest
|
Posted:
Tue Nov 29, 2005 11:25 pm Post subject:
Re: Thread Stacks |
|
|
Eric P. wrote:
| Quote: |
I disagree.
The potential amount of space a thread might use is a property
only of its specified starting routine.
|
....and its input, at least for some obvious implementations of (say)
a recursive descent parser. |
|
| Back to top |
|
 |
George Neuner
Guest
|
Posted:
Tue Nov 29, 2005 11:49 pm Post subject:
Re: Thread Stacks |
|
|
On Tue, 29 Nov 2005 11:13:55 GMT, "Peter Dickerson"
<first{dot}surname@ukonline.co.uk> wrote:
| Quote: | "Ian Rogers" <ian.rogers@manchester.ac.uk> wrote in message
news:dmhapg$6iu$1@wapping.cs.man.ac.uk...
George Neuner wrote:
On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.
The logical space is only 4GB. The base address of the selector gets
added to the address and truncated to 4GB to form the logical address
(which is a shame as it'd be nice sometimes to address more than 4GB of
RAM in a single process). The logical address then gets mapped through
the page tables onto a 36bit physical memory address. Hence 64TB of
physical memory addressable but only 4GB addressable in a single process.
Thats 64 GB not TB. 64 TB needs a 46-bit address, which is more like the
physical address range of a 64-bit processor.
Peter
|
You're confusing the *logical* address space - 16000 4GB segments -
with the *physical* address space.
The extended 64GB address space is only available on P6 or later
processors - earlier ones were limited to 4GB. And the extended space
is banked and not all accessible simultaneously.
George
--
for email reply remove "/" from address |
|
| Back to top |
|
 |
George Neuner
Guest
|
Posted:
Tue Nov 29, 2005 11:52 pm Post subject:
Re: Thread Stacks |
|
|
On Tue, 29 Nov 2005 10:33:11 +0000, Ian Rogers
<ian.rogers@manchester.ac.uk> wrote:
| Quote: | George Neuner wrote:
On 32-bit Intel or AMD you could (ab)use MMU segments ... the logical
address space is 64TB IIRC. Of course the physical address space is
only 4GB so you would also have to maintain alternate page table sets
and swap them as necessary.
The logical space is only 4GB. The base address of the selector gets
added to the address and truncated to 4GB to form the logical address
(which is a shame as it'd be nice sometimes to address more than 4GB of
RAM in a single process). The logical address then gets mapped through
the page tables onto a 36bit physical memory address. Hence 64TB of
physical memory addressable but only 4GB addressable in a single process.
Ian Rogers
|
No. The *physical* space is 4GB - the P6 and later added support for
banked access to up to 64GB.
The *logical* address space is ~16000 4GB segments [I don't remember
the exact number - some of the selector values are illegal].
George
--
for email reply remove "/" from address |
|
| Back to top |
|
 |
|
|
|
|