Simple Hardware Clock question
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Simple Hardware Clock question
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Guest






Posted: Wed Feb 09, 2005 10:11 pm    Post subject: Simple Hardware Clock question Reply with quote

Ok, so I just read that the clock synchronizes other hardware
components of a computer szystem; meaning that, because the processor
is faster than the RAM for instance, or the hard-disk, the next CPU
instruction is delayed till the next clock tick, in order to ensure
that each component completes its operation before the next phase. And
for this reason, the clocks often run at relatively slow speeds such as
333MHz - much slower than the 3GHz CPUs that we have now.

If this is so, one may say that: this CPU execute no more than
333,000,000 instructions per second!

Is this so?

Thanks for helping the Noob

Olumide
Back to top
Alexei A. Frounze
Guest





Posted: Wed Feb 09, 2005 10:43 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

<50295@web.de> wrote in message
news:1107969085.171711.152460@c13g2000cwb.googlegroups.com...
Quote:
Ok, so I just read that the clock synchronizes other hardware
components of a computer szystem; meaning that, because the processor
is faster than the RAM for instance, or the hard-disk, the next CPU
instruction is delayed till the next clock tick, in order to ensure
that each component completes its operation before the next phase. And
for this reason, the clocks often run at relatively slow speeds such as
333MHz - much slower than the 3GHz CPUs that we have now.

If this is so, one may say that: this CPU execute no more than
333,000,000 instructions per second!

Is this so?

Yes, if you write lousy software.

There are a number of techniques that help to improve the performance. To
name a few:
- on-chip CPU memory cache
- off-chip/external CPU memory cache
- interrupts (as opposed to continuous polling/busy-waiting)
- DMA
- code optimization to remove any redundancy in both calculations and
memory/device accesses

Got the idea? :)

Alex
Back to top
Maxim S. Shatskih
Guest





Posted: Wed Feb 09, 2005 10:56 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

No. External interface of the CPU can be 333MHz. But the core is 3GHz. The
core is stalled only if accessing the external interface is absolutely
necessary, and the core has nothing more to do.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com

<50295@web.de> wrote in message
news:1107969085.171711.152460@c13g2000cwb.googlegroups.com...
Quote:
Ok, so I just read that the clock synchronizes other hardware
components of a computer szystem; meaning that, because the processor
is faster than the RAM for instance, or the hard-disk, the next CPU
instruction is delayed till the next clock tick, in order to ensure
that each component completes its operation before the next phase. And
for this reason, the clocks often run at relatively slow speeds such as
333MHz - much slower than the 3GHz CPUs that we have now.

If this is so, one may say that: this CPU execute no more than
333,000,000 instructions per second!

Is this so?

Thanks for helping the Noob

Olumide
Back to top
Guest






Posted: Wed Feb 09, 2005 11:20 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

Thanks Alexi!

I undesrstand how numbers (1) and (5) can help, but not the others.
Putting your answer together with Maxim's, is it correct to say all
these techniques do NOT require the external interface?

- Olumide
Back to top
Del Cecchi
Guest





Posted: Thu Feb 10, 2005 2:21 am    Post subject: Re: Simple Hardware Clock question Reply with quote

50295@web.de wrote:
Quote:
Thanks Alexi!

I undesrstand how numbers (1) and (5) can help, but not the others.
Putting your answer together with Maxim's, is it correct to say all
these techniques do NOT require the external interface?

- Olumide

You need to distinguish all the different usages of "clock" in a

computer system.
Back to top
Nick Maclaren
Guest





Posted: Thu Feb 10, 2005 2:34 am    Post subject: Re: Simple Hardware Clock question Reply with quote

In article <36vd6hF58ai7hU2@individual.net>,
Del Cecchi <cecchinospam@us.ibm.com> wrote:
Quote:
50295@web.de wrote:
Thanks Alexi!

I undesrstand how numbers (1) and (5) can help, but not the others.
Putting your answer together with Maxim's, is it correct to say all
these techniques do NOT require the external interface?

You need to distinguish all the different usages of "clock" in a
computer system.

Well, at least you aren't asking ME to - I retire in only 10 years.
The original poster MAY be young enough to complete that task.


Regards,
Nick Maclaren.
Back to top
Alexei A. Frounze
Guest





Posted: Thu Feb 10, 2005 2:26 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

<50295@web.de> wrote in message
news:1107973214.462660.84050@c13g2000cwb.googlegroups.com...
Quote:
Thanks Alexi!

I undesrstand how numbers (1) and (5) can help, but not the others.
Putting your answer together with Maxim's, is it correct to say all
these techniques do NOT require the external interface?

2 (off-CPU memory cache) helps just like the other cache. It's basically a
herarchy of caches, each working at its own speed and the closer the cache
to the CPU the faster data retrieval. But if the cache does not contain the
information the CPU needs, the dirty work will have to be done anyway, i.e.
read from the memory.

3 (interrupts and multithreading in general): suppose you're waiting for a
key in your application, and all your system and application software is
single-threaded, i.e. no multiprocessing of any kind, no parallelism. The
easiest and the least effective is a loop like this:
while (!kbhit()) {do_something();}; // <conio.h> used
This simply wastes the CPU time, which could have been used for something
more useful, like parallel calculations in some background activity,
whatever. This is where interrupts help -- instead of waiting in an infinite
loop and doing nothing, you set your keyboard interrupt routine that is
called once per key hit/release, opposed to some millions of calls to
kbhit() in a loop. You advance your state machine upon the keyboard event,
using as little of the CPU time as needed, with no excessive overhead.

4 (DMA): this tiny bit of circutry does memory-to-device I/O transparent to
the CPU, it goes w/o too much of the CPU time overhead because the CPU is
interrupted only at the times when there's some data ready for it or can be
taken from it. Just that, no loops like in the above. Yet DMA usually works
with blocks of data, which again helps to minimize the overhead (you get one
interrupt on a block of bytes as opposed to getting on each byte).

Read some computer architecture book, like Tanenbaum's...

Alex
Back to top
Guest






Posted: Thu Feb 10, 2005 7:36 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

Quote:

2 (off-CPU memory cache) helps just like the other cache. It's
basically a
herarchy of caches, each working at its own speed and the closer the
cache
to the CPU the faster data retrieval. But if the cache does not
contain the
information the CPU needs, the dirty work will have to be done
anyway, i.e.
read from the memory.

3 (interrupts and multithreading in general): suppose you're waiting
for a
key in your application, and all your system and application software
is
single-threaded, i.e. no multiprocessing of any kind, no parallelism.
The
easiest and the least effective is a loop like this:
while (!kbhit()) {do_something();}; // <conio.h> used
This simply wastes the CPU time, which could have been used for
something
more useful, like parallel calculations in some background activity,
whatever. This is where interrupts help -- instead of waiting in an
infinite
loop and doing nothing, you set your keyboard interrupt routine that
is
called once per key hit/release, opposed to some millions of calls to
kbhit() in a loop. You advance your state machine upon the keyboard
event,
using as little of the CPU time as needed, with no excessive
overhead.

4 (DMA): this tiny bit of circutry does memory-to-device I/O
transparent to
the CPU, it goes w/o too much of the CPU time overhead because the
CPU is
interrupted only at the times when there's some data ready for it or
can be
taken from it. Just that, no loops like in the above. Yet DMA usually
works
with blocks of data, which again helps to minimize the overhead (you
get one
interrupt on a block of bytes as opposed to getting on each byte).

Read some computer architecture book, like Tanenbaum's...

Alex

Thanks Alexei,

I know about all this - trust me, but I fail to see how the external
cache, or the use of interrupts, or DMA can cause the CPU to execute
more than 1 instruction in a hardware clock cycle. What I'm trying to
say is that I fail to see how external cache, or the use of interrupts,
or DMA constitute an internal interface for/of the CPU. (I really like
Maxim's answer ;-) . Are you there Maxim?)

- Olumide
Back to top
Maxim S. Shatskih
Guest





Posted: Thu Feb 10, 2005 8:27 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

Quote:
cache, or the use of interrupts, or DMA can cause the CPU to execute
more than 1 instruction in a hardware clock cycle. What I'm trying to

Several execution units can execute several instructions per cycle, if they are
not dependent on one another.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
Back to top
Alexei A. Frounze
Guest





Posted: Thu Feb 10, 2005 8:57 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

"Maxim S. Shatskih" <maxim@storagecraft.com> wrote in message
news:cufugn$2mgn$1@gavrilo.mtu.ru...
Quote:
cache, or the use of interrupts, or DMA can cause the CPU to execute
more than 1 instruction in a hardware clock cycle. What I'm trying to

Several execution units can execute several instructions per cycle, if
they are
not dependent on one another.

Right, and now you may have CPUs with several cores or that hyperthreading
feature, so, you can effectively have more than 1 instruction per clock due
to the parallelism. intel x86 CPUs probably have not a lot of useful
instructions that take just 1 clock :)

What I was trying to say in my previous posts is that even though the
circuitry that is connected to the CPU can be rather slow (effectively
running with slower clocks than that of the CPU), it just doesn't mean the
CPU itself starts running as slow as they do.

Alex
Back to top
Peter D.
Guest





Posted: Fri Feb 11, 2005 1:38 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

50295@web.de wrote in comp.os.linux.hardware:

[snip]
Quote:
I know about all this - trust me, but I fail to see how the external
cache, or the use of interrupts, or DMA can cause the CPU to execute
more than 1 instruction in a hardware clock cycle.
[snip]


There are many clocks in a PC. For example the CPUs clock might be
running seventeen times faster than the motherboard's main bus.

It is quite normal for the CPU to do many things between motherboard
clock ticks. Hence the need for the main cache.


--
Peter D.
Sig goes here...
Back to top
Stephane Hockenhull
Guest





Posted: Sun Feb 13, 2005 8:11 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

50295@web.de wrote:
Quote:
If this is so, one may say that: this CPU execute no more than
333,000,000 instructions per second!


the internal cache runs at the same speed as the cpu, so the cpu can
execute "complex" calculations in its cache and then send the result to
memory.

for example, doing an interpolation of two values:

r = a[i] + (a[i+1] - a[i]) * fraction_of(i);

with i's value in a register, you read two values a[i] and a[i+1], the
second a[i] is cached so memory isnt accessed, calculate the difference
in to a cpu register (no memory access), multiply by fraction_of(i)
(also in register: no memory access), add a[i] (register again) and
store r to memory

so we only did 3 memory accesses, and as the memory bus is usually wider
(64, 128 or even 256bits) than the data we're working on (32bits), a[i]
and a[i+1] might have been read at the same time, making it just 2
memory access for 8 operations, some of which could take more than one
cpu cycle.

if this code is in a loop and the loop fits in the instruction cache
then no reads will be done for most of the iterations (only the first
will load the instruction cache).


so, the cpu does make more work than, say, 333Mhz but it also does MUCH
less work than its 3.33Ghz clock would allow it to do if it ran on
3.33Ghz memory.

some cpu will even overheat if you make them run too efficiently as
they're expected to be regularly slowed down by running on slow memory.
Back to top
Guest






Posted: Fri Feb 18, 2005 11:57 pm    Post subject: Re: Simple Hardware Clock question Reply with quote

Alexei A. Frounze wrote:
Quote:
"Maxim S. Shatskih" <maxim@storagecraft.com> wrote in message
news:cufugn$2mgn$1@gavrilo.mtu.ru...
cache, or the use of interrupts, or DMA can cause the CPU to
execute
more than 1 instruction in a hardware clock cycle. What I'm
trying to

Several execution units can execute several instructions per cycle,
if
they are
not dependent on one another.

Right, and now you may have CPUs with several cores or that
hyperthreading
feature, so, you can effectively have more than 1 instruction per
clock due
to the parallelism. intel x86 CPUs probably have not a lot of useful
instructions that take just 1 clock :)

What I was trying to say in my previous posts is that even though the
circuitry that is connected to the CPU can be rather slow
(effectively
running with slower clocks than that of the CPU), it just doesn't
mean the
CPU itself starts running as slow as they do.


All modern CPUS (since about 1980) are pipelined in some form, meaning
that the work of an individual instruction is broken up into many
units, each taking a clock cycle.

A common analogy is doing laundry: there is a washer and a dryer. When
the first load A finishes washing, we can put it in the dryer, but
while A is drying, we can start the next load B in the washer. Then A
finishes drying and B finishes washing. A is now done, and B moves to
drying while the next load C starts washing. If the time for washing
and drying is T, then we achieve 1/T loads throughput, while each load
actually takes 2T to complete.

In modern CPUs like the Athlon or Pentium 4, the pipeline can be as
long as 10 or 20 stages. Therefore even though each instruction takes
10 or 20 cycles, they are pipelined so that we can achieve 1
instr/cycle throughput.

For more information, see:

Computer Architecture: A Quantitative Approach, John Hennessy, David
Patterson
Back to top
Alexei A. Frounze
Guest





Posted: Sat Feb 19, 2005 12:46 am    Post subject: Re: Simple Hardware Clock question Reply with quote

<diablovision@yahoo.com> wrote in message
news:1108753064.884015.36050@l41g2000cwc.googlegroups.com...
....
Quote:
All modern CPUS (since about 1980) are pipelined in some form, meaning
that the work of an individual instruction is broken up into many
units, each taking a clock cycle.

A common analogy is doing laundry: there is a washer and a dryer. When
the first load A finishes washing, we can put it in the dryer, but
while A is drying, we can start the next load B in the washer. Then A
finishes drying and B finishes washing. A is now done, and B moves to
drying while the next load C starts washing. If the time for washing
and drying is T, then we achieve 1/T loads throughput, while each load
actually takes 2T to complete.

In modern CPUs like the Athlon or Pentium 4, the pipeline can be as
long as 10 or 20 stages. Therefore even though each instruction takes
10 or 20 cycles, they are pipelined so that we can achieve 1
instr/cycle throughput.

Right, yet clear enough for a housewife to understand.

Alex
Back to top
Maxim S. Shatskih
Guest





Posted: Wed Feb 23, 2005 7:34 am    Post subject: Re: Simple Hardware Clock question Reply with quote

Quote:
In modern CPUs like the Athlon or Pentium 4, the pipeline can be as
long as 10 or 20 stages. Therefore even though each instruction takes
10 or 20 cycles, they are pipelined so that we can achieve 1
instr/cycle throughput.

More so.

Even P5 Pentium was capable of running 2 instructions in the flow in parallel,
provided they do not depend on one another (operands of second are not altered
by first).

This feature is called "superscalar". Sparc CPU is even more great in such
ability.

The weak point of superscalar is that the decision on paralleling is done in
runtime by CPU hardware, which cannot keep large context.

The Very Long Instruction Word (VLIW) CPU like IA-64 loads this burden to the
compiler. The compiler (which can keep huge context) decides how to parallel
the operations between several CPU cores.

The back sides are huge complexity of compiler and assembler language (nearly
impossible to write manual assembler, too much context to keep in head).

Yet another approach to fast CPUs. Throw away any complexity, use the saved
silicon space for cache and raise the frequency as fast as it is possible.
Pentium 4 and Alpha go this way (Alpha even sacrificed any complexity away from
assembler language - it only has 64bit arithmetics, if you want byte one -
write a subroutine).

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB