| Author |
Message |
Raven8712
Guest
|
Posted:
Fri Sep 02, 2005 11:03 pm Post subject:
Multicores |
|
|
Does anyone have any thoughts on whether dual processors are worth
building into a home computer or not?
- Raven
----> For a fun time, check this out>
http://ZeldereX.com/?topic=80844 |
|
| Back to top |
|
 |
Randy
Guest
|
Posted:
Sat Sep 03, 2005 12:15 am Post subject:
Re: Multicores |
|
|
Raven8712 wrote:
| Quote: | Does anyone have any thoughts on whether dual processors are worth
building into a home computer or not?
- Raven
|
Yes. No.
Randy |
|
| Back to top |
|
 |
DonQuixote
Joined: 23 Aug 2005
Posts: 3
|
Posted:
Sat Sep 03, 2005 3:40 am Post subject:
|
|
|
It depends.
Seriously. If you are looking for more speed out of an individual application, you probably are not going to get it with a multicore processor, because new software has to be specifically written to handle multithreading. We may see it coming out in some games soon, but I don't think we will see a huge rush to multithreaded sofware in the home PC market just yet. The development costs are just no worth the gains in perfomance.
The real benefits of multithreading come at the system level when you have alot of programs running at the same time. Even so, most of the performance issues have to do with the speed of the hard drive, not the CPU. I think the most cost effecttive way to get good performance is to make sure you have plenty of RAM. When I got my computer HP was giving away a free upgrade from 256 to 512 MB RAM, and I'm glad I got it. |
|
| Back to top |
|
 |
Skybuck Flying
Guest
|
Posted:
Sat Sep 03, 2005 6:22 am Post subject:
Re: Multicores |
|
|
"Raven8712" <raven8712@yahoo.com> wrote in message
news:1125684201.841810.126440@g43g2000cwa.googlegroups.com...
| Quote: | Does anyone have any thoughts on whether dual processors are worth
building into a home computer or not?
|
Dual core is pretty expensive at the moment.
If you want a cheap computer the answer (at this point in time) is no.
If you want an interesting, high responsive computer and have money to burn
the answer is yes =D
An interesting question is the following question:
Suppose there are two processors one single core and one dual core. Both are
rated at the same speed for example:
Single Core Processor 5000+
Dual Core Processor 5000+
The question is how the rating is done.
The disadventage of a single core processor is that the software/operating
system/applications will have to do many "context switches". Only one
thread/process can run on a single core processor at a time, so multiple
concurrent threads/processess will have to be swapped on and off the
processor as to create the illusion of parallel processing. This switching
is called a "context switch" and can be considered "overhead" <- which is
loss of performance/speed.
Software/operating systems/applications which use many threads will cause
lot's of overhead on a single core processor.
Here is an example:
Two threads need to run concurrently/parallel. The software/operating system
will use context switches to create the illusion of concurrency/parallelism.
For example let's look at 10 time slices. In principle it will work
something like this:
Single Core Processor:
Thread1, Time slice1,
Contex Switch1
Thread2, Time slice1,
Contex Switch2
Thread1, Time slice2,
Contex Switch3
Thread2, Time slice2,
Contex Switch4
Thread1, Time slice3,
Contex Switch5
Thread2, Time slice3,
Contex Switch6
Thread1, Time slice4,
Contex Switch7
Thread2, Time slice4,
Contex Switch8
Thread1, Time slice5,
Contex Switch9
Thread2, Time slice5,
Contex Switch10
Thread1, Time slice6,
Contex Switch11
Thread2, Time slice6,
Contex Switch12
Thread1, Time slice7,
Contex Switch13
Thread2, Time slice7,
Contex Switch14
Thread1, Time slice8,
Contex Switch15
Thread2, Time slice8,
Contex Switch16
Thread1, Time slice9,
Contex Switch17
Thread2, Time slice9,
Contex Switch18
Thread1, Time slice10,
Contex Switch19
Thread2, Time slice10,
So as you can see from this example there are 19 context switches needed.
(Which is quite a lot of overhead at least that's what I think as a
programmer ;) and this is only for two threads !)
Now compare the above example to the dual core example which in principle
could work like this:
Thread 1 is dedicated to processor/core 1
Thread 2 is dedicated to processor/core 2
Thread1, Time slice1, Thread2, Time slice1,
Thread1, Time slice2, Thread2, Time slice2,
Thread1, Time slice3, Thread2, Time slice3,
Thread1, Time slice4, Thread2, Time slice4,
Thread1, Time slice5, Thread2, Time slice5,
Thread1, Time slice6, Thread2, Time slice6,
Thread1, Time slice7, Thread2, Time slice7,
Thread1, Time slice8, Thread2, Time slice8,
Thread1, Time slice9, Thread2, Time slice9,
Thread1, Time slice10, Thread2, Time slice10,
In principle no contex switches would be needed which would be a HUGE
adventage for performance.
(Assuming that both threads run at full speed and do a full workload etc.)
Dedicating threads to processors might make the prediction logic of the
processor work also much better. Though I am not completely sure about
that... maybe the processor has some tricks to store the prediction
logic/data in a cache when it detects a context switch or something... and
is able to load other prediction logic related to another thread somehow
etc... I dont know if processor can do this ;)
Ofcourse this could be an overly simplied look on how it works. The
dual/multi core processor has to share a single source of memory and a
single bus etc (?) So those might cause problems like concurrency problems
(?) or bottlenecks (?)
So the more cores a processor has the less contex switches are needed in
case the threads are all dedicated to a certain processor/core which would
be really cool/good =D
So for me as a (delphi) programmer and probably all other (delphi)
programmers multi core is definetly interesting. Especially since the delphi
language/development environment/libraries allow easy programming and
debugging of multi threaded applications =D <- child's play really =D
For me personally the multi threaded challenge is not programming the stuff,
the challenging is designing the stuff. It creates a whole range of other
(common) problems. For example thread1 might be receiving data from a
network, thread 2 might be processing the data from thread1, thread 3 might
be sending replies from thread 2.
Thread 3 might not be able to keep up with thread2. Thread 2 might not be
able to keep up with Thread1, etc. I think this is what is commonly known as
a "producer/consumer" problem etc ;)
Analyzing this design shows an immediate obvious flaw. The threads are
depending on each others outcome and are therefore bottlenecked etc.
A common different design is to have a "pool" of threads which all do the
same work but do it simply in parallel. For example:
Thread1 receives data, processes data, sends data.
Thread2 receives data, processes data, sends data.
Thread3 receives data, processes data, sends data.
This design would probably create a sort of pipeline. The network card can
only receive one packet at a time usually. Though this design would probably
still be faster than a single thread.
It would probably go something like this at least in principle:
Thread1 receives data.
Thread2 waits for data.
Thread3 waits for data.
Thread1 processess data
Thread2 receives data
Thread3 waits for data.
Thread1 sends data
Thread2 processes data
Thread3 receives data.
Etc:
Thread1 receives data.
Thread2 sends data.
Thread3 processes data.
Thread1 processes data
Thread2 receives data
Thread3 sends data.
Thread1 sends data
Thread2 processess data
Thread3 receives data.
Etc, etc, etc ;)
So a multi core processor might be able to utilize the network card better
simply because it's more responsive. It can immediatly receive a network
packet at least at the start and probably also later on.
The single core processor or single threaded software might not be fast
enough to receive,process and send the data and then be ready for the next
receive of data. "Blocking software" (which waits for a certain action to
complete could increase this problem) (I like to think waiting is bad ;))
Though this is a pretty simple example and therefore I wonder if this
example would make any significant performance difference in reality ;)
However the first example still stands ;) so for me as a programmer there is
no doubt. A well written software
program/application could run faster on a dual/multi core processor simply
because it prevents "expensive" context switches =D (by dedicating each
thread to a core/processor)
Bye,
Skybuck. |
|
| Back to top |
|
 |
Kelly Hall
Guest
|
Posted:
Sat Sep 03, 2005 7:35 am Post subject:
Re: Multicores |
|
|
Raven8712 wrote:
| Quote: | Does anyone have any thoughts on whether dual processors are worth
building into a home computer or not?
|
It depends. A couple of years ago, my home PC was a dual Celeron 400
box (Abit BP6 mobo) running BeOS. The hardware was cheap, and the
price/performance was great. I've since replaced that box with a single
Pentium M CPU machine running WinXP.
I think there are still dual CPU mobos available, but I don't know if
they are cheap enough for home use - it's not clear to me if the
price/performance is worthwhile.
I would expect that the price of Mac dual G4 systems is getting
reasonable, but I haven't priced anything recently.
Kelly |
|
| Back to top |
|
 |
robertwessel2@yahoo.com
Guest
|
Posted:
Sat Sep 03, 2005 8:15 am Post subject:
Re: Multicores |
|
|
Skybuck Flying wrote:
| Quote: | An interesting question is the following question:
Suppose there are two processors one single core and one dual core. Both are
rated at the same speed for example:
Single Core Processor 5000+
Dual Core Processor 5000+
The question is how the rating is done.
The disadventage of a single core processor is that the software/operating
system/applications will have to do many "context switches". Only one
thread/process can run on a single core processor at a time, so multiple
concurrent threads/processess will have to be swapped on and off the
processor as to create the illusion of parallel processing. This switching
is called a "context switch" and can be considered "overhead" <- which is
loss of performance/speed.
Software/operating systems/applications which use many threads will cause
lot's of overhead on a single core processor.
Here is an example:
|
(much silliness snipped)
Except for a few special cases, nobody would want two 1x processors
instead of a single 2x processor. In short, the single faster
processor will run almost everything fast, the dual processor will run
fast only if there is consistently more than one thread ready to
execute, and at best will match the performance of the single
processor.
The number of context switches is not typically reduced in any
significant way by going to a dual (or multi-) CPU system. The vast
majority of context switches occur because a thread blocks or calls on
another thread or context for a service (really the same thing), and
those are *not* impacted by the number of CPUs in the system. Context
switches due to time slice expirations *can* be reduced by a multiple
CPU configuration, but only if the number of threads tends to be close
to the number of CPUs. If there are many runnable thread, and the time
slice interval does not change, even that can be slower on the dual
processor system, since slices will occur after only half the number of
instructions. In any event, time slice triggered context switches are
*very* rare. At best a few tens to a few hundred per second.
Now two processors of speed 1.5x instead of a single 2x CPU is a lot
more interesting. |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Sat Sep 03, 2005 4:15 pm Post subject:
Re: Multicores |
|
|
In article <7oydnZ2dnZ0UikTrnZ2dnRkThN6dnZ2dRVn-052dnZ0@comcast.com>,
Joe Seigh <jseigh_01@xemaps.com> wrote:
| Quote: | robertwessel2@yahoo.com wrote:
The number of context switches is not typically reduced in any
significant way by going to a dual (or multi-) CPU system. The vast
majority of context switches occur because a thread blocks or calls on
another thread or context for a service (really the same thing), and
those are *not* impacted by the number of CPUs in the system. Context
switches due to time slice expirations *can* be reduced by a multiple
CPU configuration, but only if the number of threads tends to be close
to the number of CPUs. If there are many runnable thread, and the time
slice interval does not change, even that can be slower on the dual
processor system, since slices will occur after only half the number of
instructions. In any event, time slice triggered context switches are
*very* rare. At best a few tens to a few hundred per second.
Except for systems like Linux which preempt signaling threads (at least
on single processor systems). You can significantly speed up a Linux
multi-threaded application by adding a sched_yield() after a blocked
wait. One of the rare instances where two wrongs make a right.
|
And seriously parallel applications that do not spin wait. It is
very common (and reasonable) to test a memory-based lock for the other
end of a channel having responded, and to enter a wait if it doesn't
after a short period. If you have enough cores to keep every thread
going, the number of such waits can be much smaller than if you do
not.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Joe Seigh
Guest
|
Posted:
Sat Sep 03, 2005 4:15 pm Post subject:
Re: Multicores |
|
|
robertwessel2@yahoo.com wrote:
| Quote: |
The number of context switches is not typically reduced in any
significant way by going to a dual (or multi-) CPU system. The vast
majority of context switches occur because a thread blocks or calls on
another thread or context for a service (really the same thing), and
those are *not* impacted by the number of CPUs in the system. Context
switches due to time slice expirations *can* be reduced by a multiple
CPU configuration, but only if the number of threads tends to be close
to the number of CPUs. If there are many runnable thread, and the time
slice interval does not change, even that can be slower on the dual
processor system, since slices will occur after only half the number of
instructions. In any event, time slice triggered context switches are
*very* rare. At best a few tens to a few hundred per second.
|
Except for systems like Linux which preempt signaling threads (at least
on single processor systems). You can significantly speed up a Linux
multi-threaded application by adding a sched_yield() after a blocked
wait. One of the rare instances where two wrongs make a right.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software. |
|
| Back to top |
|
 |
Benny Amorsen
Guest
|
Posted:
Sat Sep 03, 2005 4:15 pm Post subject:
Re: Multicores |
|
|
| Quote: | "rv" == robertwessel2@yahoo com <robertwessel2@yahoo.com> writes:
|
rv> In any event, time slice triggered context switches are *very*
rv> rare. At best a few tens to a few hundred per second.
IIRC, in the original AmigaOS -- I think up to 2.0 -- time slice
triggered context switching was actually broken -- but no one noticed,
since enough events were usually generated to keep contexts switching
back and forth. (These days a scheduler would probably give control
back to a low-priority process interrupted by a high-priority process
as quickly as possible, and so the bug would not be hidden.)
/Benny |
|
| Back to top |
|
 |
Skybuck Flying
Guest
|
Posted:
Sun Sep 04, 2005 12:15 am Post subject:
Re: Multicores |
|
|
<robertwessel2@yahoo.com> wrote in message
news:1125731066.086811.73210@z14g2000cwz.googlegroups.com...
| Quote: |
Skybuck Flying wrote:
An interesting question is the following question:
Suppose there are two processors one single core and one dual core. Both
are
rated at the same speed for example:
Single Core Processor 5000+
Dual Core Processor 5000+
The question is how the rating is done.
The disadventage of a single core processor is that the
software/operating
system/applications will have to do many "context switches". Only one
thread/process can run on a single core processor at a time, so multiple
concurrent threads/processess will have to be swapped on and off the
processor as to create the illusion of parallel processing. This
switching
is called a "context switch" and can be considered "overhead" <- which
is
loss of performance/speed.
Software/operating systems/applications which use many threads will
cause
lot's of overhead on a single core processor.
Here is an example:
(much silliness snipped)
Except for a few special cases, nobody would want two 1x processors
instead of a single 2x processor. In short, the single faster
processor will run almost everything fast, the dual processor will run
fast only if there is consistently more than one thread ready to
execute, and at best will match the performance of the single
processor.
|
I would want a dual core processor, I would even want a 1000 core processor.
The reason is simply because then I as a programmer can start writing
*serious* multi threaded software which can take adventage of multi core
processors.
Assuming that single core processors hit a certain limit than a simple way
to get more speed is to do things in parallel.
The hardware folks think that their cpu's might not be able to go any faster
by increasing the clock speeds so the hardware folks will have to find other
ways to speed it up and one fine way of doing that is multiple cores and
everything else in parallel too ;)
So as more and more software becomes multi threaded these dual/multi cores
cpu will work much better because they will be much more responsive and the
software can take advantage of the multiple cores thus increasing in speed
too, best of both worlds really ;)
| Quote: | The number of context switches is not typically reduced in any
significant way by going to a dual (or multi-) CPU system. The vast
majority of context switches occur because a thread blocks or calls on
another thread or context for a service (really the same thing), and
those are *not* impacted by the number of CPUs in the system. Context
switches due to time slice expirations *can* be reduced by a multiple
CPU configuration, but only if the number of threads tends to be close
to the number of CPUs. If there are many runnable thread, and the time
slice interval does not change, even that can be slower on the dual
processor system, since slices will occur after only half the number of
instructions. In any event, time slice triggered context switches are
*very* rare. At best a few tens to a few hundred per second.
|
As far as I can tell windows xp doesn't care how many threads are running,
it will simply give each thread the same ammount of time slice thereby
lagging the whole system. That's why the number of contex switches doesn't
increase as the number of threads increase. Windows xp makes absolutely no
attempt to keep the system responsive. To keep the system responsive windows
xp would need to shorten each time slice and thereby creating more contex
switches thus leaving less processing power to do anything usefull.
Bye,
Skybuck. |
|
| Back to top |
|
 |
Ken Hagan
Guest
|
Posted:
Tue Sep 06, 2005 2:31 pm Post subject:
Re: Multicores |
|
|
Skybuck Flying wrote:
| Quote: |
I would want a dual core processor, I would even want a 1000 core processor.
The reason is simply because then I as a programmer can start writing
*serious* multi threaded software which can take adventage of multi core
processors.
|
A nit pick...
You can write such software now. Your customers will need to
wait until they have such a processor before they will benefit
from it.
Until then, there is no advantage to parallel software, only
overheads and cost. That's why we aren't all doing it already. |
|
| Back to top |
|
 |
John Savard
Guest
|
Posted:
Tue Sep 06, 2005 3:46 pm Post subject:
Re: Multicores |
|
|
On Sun, 4 Sep 2005 01:15:40 +0200, "Skybuck Flying" <nospam@hotmail.com>
wrote, in part:
| Quote: | To keep the system responsive windows
xp would need to shorten each time slice and thereby creating more contex
switches thus leaving less processing power to do anything usefull.
|
To avoid context switching, you don't need to have another whole core.
Just use register renaming, and have a multi-threaded CPU.
First, make the fastest possible single core you can.
Then, make it multithreaded, so that the only context switches you have
are the unavoidable ones due to procedure calls.
Then, if you still have die area, add cores.
John Savard
http://home.ecn.ab.ca/~jsavard/index.html
http://www.quadibloc.com/index.html
_________________________________________
Usenet Zone Free Binaries Usenet Server
More than 140,000 groups
Unlimited download
http://www.usenetzone.com to open account |
|
| Back to top |
|
 |
Anne & Lynn Wheeler
Guest
|
Posted:
Tue Sep 06, 2005 9:18 pm Post subject:
Re: Multicores |
|
|
jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:
| Quote: | To avoid context switching, you don't need to have another whole
core. Just use register renaming, and have a multi-threaded CPU.
First, make the fastest possible single core you can.
Then, make it multithreaded, so that the only context switches you
have are the unavoidable ones due to procedure calls.
Then, if you still have die area, add cores.
|
that was sort of the 370/195 dual i-stream proposal from the early
70s. the issue was that most codes kept the pipeline only about
half-full. having dual instruction counters and dual registers
.... with pipeline tagging registers and i-stream had a change of
maintaining close to aggregate, peak thruput of the pipeline.
amdahl in the early 80s ... had another variation on that.
running straight virtual machine hypervisor ... resulted in context
switch on privilege instructions and i/o interrupts (between the
virtual machine and the virtual machine hypervisor), including saving
registers, other state, etc (and then restoring).
starting with virtual machine assist on the 370/158 and continuing
with ECPS on the 138/148 to the LPAR support on modern machines ...
more and more of the virtual machine hypervisor was being implemented
in the microcode of the real machine ... aka the real machine
instruction implementation (for some instructions) would recognize
whether it was in real machine state or virtual machine state and
modify the instruction decode and execution appropriately. one of the
issues was that microcode tended to be a totally different beast and
with little in the way of software support tools.
http://www.garlic.com/~lynn/subtopic.html#mcode
amdahl 370 clones implemented an intermediate layer called macrocode
that effectively looked and tasted almost exactly like 370 ... but had
its own independent machine state. this basically allowed almost
exactly moving some virtual machine hypervisor code to macrocode level
.... w/o the sometimes difficult translation issues ... while
eliminating standard context switching overhead (register and other
state save/restore).
it was also an opportunity to do some quick speed up. standard 370
architecture allows for self-modifying instructions ... before stuff
like speculative execution (with rollback) support ... the checking
for catching whether the previous instruction had modified the current
(following) instruction being decoded & scheduled for execution
.... frequently doubled 370 instruction elapsed time processing.
macrocode architecture was specified as not supporting self-modifying
370 instruction streams.
i've frequently claimed that the 801/risc formulation in the mid-70s,
http://www.garlic.com/~lynn/subtopic.html#801
was opportunity to do the exact opposite of other stuff in the period:
combination of the failure of future system project (extremely complex
hardware architecture)
http://www.garlic.com/~lynn/subtopic.html#futuresys
and heavy overhead performance paid by 370 architecture supporting
self-modifying instruction stream and very strong memory coherency
(and overhead) in smp implementations.
separating instruction and data caches and providing for no coherency
support between stores to the data cache and what was in the
instruction cache ... precluded even worrying about self-modifying
instruction operation. no support for for any kind of cache coherency
.... down to the very lowest of levels ... also precluded such support
for multiprocessing operations.
i've sometimes observed that ibm/motorola/apple/etc somerset was sort
of taking the rios risc core and marrying it with 88k cache coherencyl.
recent, slightly related posting
http://www.garlic.com/~lynn/2005o.html#37 What ever happened to Tandem and NonStop OS?
--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |
|
| Back to top |
|
 |
robertwessel2@yahoo.com
Guest
|
Posted:
Wed Sep 07, 2005 12:15 am Post subject:
Re: Multicores |
|
|
Skybuck Flying wrote:
| Quote: | robertwessel2@yahoo.com> wrote in message
news:1125731066.086811.73210@z14g2000cwz.googlegroups.com...
The number of context switches is not typically reduced in any
significant way by going to a dual (or multi-) CPU system. The vast
majority of context switches occur because a thread blocks or calls on
another thread or context for a service (really the same thing), and
those are *not* impacted by the number of CPUs in the system. Context
switches due to time slice expirations *can* be reduced by a multiple
CPU configuration, but only if the number of threads tends to be close
to the number of CPUs. If there are many runnable thread, and the time
slice interval does not change, even that can be slower on the dual
processor system, since slices will occur after only half the number of
instructions. In any event, time slice triggered context switches are
*very* rare. At best a few tens to a few hundred per second.
As far as I can tell windows xp doesn't care how many threads are running,
it will simply give each thread the same ammount of time slice thereby
lagging the whole system.
|
Windows, like pretty much any other OS, will dole out CPU time in
time-slice increments to threads that run CPU bound, but only rarely
does a thread (in most systems) ever run out a time slice before
blocking on some event (and thus threads are rarely CPU bound).
| Quote: | That's why the number of contex switches doesn't
increase as the number of threads increase.
|
This is perhaps true for a collection of CPU bound threads, but the
vast majority of threads in a system are (typically) not.
| Quote: | Windows xp makes absolutely no
attempt to keep the system responsive. To keep the system responsive windows
xp would need to shorten each time slice and thereby creating more contex
switches thus leaving less processing power to do anything usefull.
|
Different versions of Windows use different time-slice intervals. And
there are some semi-documented ways that you can change that. Windows
takes considerable effort to keep the system responsive (CPU bound
threads get a priority reduction, foreground applications get a boost,
etc.), but in the presence of multiple CPU bound threads attached to
message queues (which is an application design problem), there's not a
whole lot you can do.
In short, time slice intervals have only a little to do with system
responsiveness. |
|
| Back to top |
|
 |
Jan Vorbrüggen
Guest
|
Posted:
Wed Sep 07, 2005 4:15 pm Post subject:
Re: Multicores |
|
|
| Quote: | Different versions of Windows use different time-slice intervals. And
there are some semi-documented ways that you can change that. Windows
takes considerable effort to keep the system responsive (CPU bound
threads get a priority reduction, foreground applications get a boost,
etc.), but in the presence of multiple CPU bound threads attached to
message queues (which is an application design problem), there's not a
whole lot you can do.
|
I have never seen this work properly in W2K...once you have one compute-
bound thread running, the GUI becomes totally unresponsive. Nextstep, now
that is another matter.
Jan |
|
| Back to top |
|
 |
|
|
|
|