Software Pipelining
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Software Pipelining

 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Guest






Posted: Thu Nov 24, 2005 12:22 am    Post subject: Software Pipelining Reply with quote

How can the following loop be pipelined?

int d = get_d();
int dmask = d >> 31;

for(int i = 0, j = 0; i < 128; )
{

d = d + C0 + (C1 & dmask);
i = i + C2;
j = j + (C3 & (~dmask));
dmask = d >> 31;

A[i + j] = C4;

}

C0, C1, C2, C3, C4 are constants. A is a unidimensional array.

All operations in this example excepting the shift have
latency/throughput of 1/1.

The shift has latency/throughput of 4/1.

You can choose any-issue architecture(please specify your
assumptions).
Back to top
Greg Lindahl
Guest





Posted: Thu Nov 24, 2005 12:49 am    Post subject: Re: Software Pipelining Reply with quote

Quote:
How can the following loop be pipelined?

Via this easy method:

1) Receive homework question from professor
2) Post homework question to Usenet
3) Recieve answer
4) Turn in answer to professor
5) Profit!

-- greg
Back to top
Natarajan Krishnaswami
Guest





Posted: Thu Nov 24, 2005 1:15 am    Post subject: Re: Software Pipelining Reply with quote

On 2005-11-23, maytersft@yahoo.com <maytersft@yahoo.com> wrote:
Quote:
How can the following loop be pipelined?

The easiest way to see this in general is via gcc. Compile and
generate listings both ways with
gcc -g -o foo.o -Wa,-ahls foo.c
and
gcc -g -o foo.o -Wa,-ahls -pipe foo.c
to see if any opportunities to exploit SW pipelining were present and
used.

By using a cross-compiler, you can see how this varies from
architecture to architecture. Note that in some cases it can be a
pessimization to perform SW pipelining, as dynamic microarchitectural
features like register renaming, OoO and SMT may do a better job of
extracting parallelism. Gcc is aware of this, and emits identical
code (refuses to pipeline) if it thinks that will be the case.

HTH,
N.
Back to top
Terje Mathisen
Guest





Posted: Thu Nov 24, 2005 3:36 pm    Post subject: Re: Software Pipelining Reply with quote

maytersft@yahoo.com wrote:

Quote:
How can the following loop be pipelined?

int d = get_d();
int dmask = d >> 31;

for(int i = 0, j = 0; i < 128; )
{

d = d + C0 + (C1 & dmask);
i = i + C2;
j = j + (C3 & (~dmask));
dmask = d >> 31;

A[i + j] = C4;

}

C0, C1, C2, C3, C4 are constants. A is a unidimensional array.

All operations in this example excepting the shift have
latency/throughput of 1/1.

The shift has latency/throughput of 4/1.

You can choose any-issue architecture(please specify your
assumptions).

This is a trick assignment!

It quite obviously cannot be pipelined, due to the sequential dependencies.

Knowing the actual constant values, as well as the result of get_d()
_might_ make a small difference, but nothing you should worry about when
you turn in your homework!

Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
kavi



Joined: 27 Jan 2005
Posts: 14
Location: Planet Earth

Posted: Thu Nov 24, 2005 4:47 pm    Post subject: Reply with quote

hi there

why is the shift operation so computationally intensive? For a typical RISC it doesn't make much sense.

Some VLIW compilers support software pipelining optimization, for example Trimaran (needs significant effort it get going with) and VEX (probably less effort).

-kavi
_________________
-kavi a.k.a. "Uncle Noah"
Back to top
View user's profile Send private message
Guest






Posted: Thu Nov 24, 2005 5:15 pm    Post subject: Re: Software Pipelining Reply with quote

$ gcc --version
gcc (GCC) 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)
Copyright (C) 2004 Free Software Foundation, Inc.

'-ahls' is not supported as an argument.
Back to top
Guest






Posted: Thu Nov 24, 2005 5:15 pm    Post subject: Re: Software Pipelining Reply with quote

Neat!

I forgot to mention that the word length is 32 bits so,
although not standard C, d >> 31 actually has the semantics
we tend to assume.
Back to top
Ken Hagan
Guest





Posted: Thu Nov 24, 2005 5:15 pm    Post subject: Re: Software Pipelining Reply with quote

maytersft@yahoo.com wrote:
Quote:
Neat!

I forgot to mention that the word length is 32 bits so,
although not standard C, d >> 31 actually has the semantics
we tend to assume.

OK, I'll have a more serious go. You say that all operations
have latency 1 except for the shift which is latency 4, but
you have nearly a dozen operations in each iteration with
only a single shift, the results of which are not needed until
the following iteration.

Quote:
for(int i = 0, j = 0; i < 128; )
{
d = d + C0 + (C1 & dmask);
i = i + C2;
j = j + (C3 & (~dmask));
dmask = d >> 31;
A[i + j] = C4;
}

Why bother to pipeline anything?
Back to top
Terje Mathisen
Guest





Posted: Thu Nov 24, 2005 5:15 pm    Post subject: Re: Software Pipelining Reply with quote

maytersft@yahoo.com wrote:

Quote:
How can the following loop be pipelined?

int d = get_d();
int dmask = d >> 31;

for(int i = 0, j = 0; i < 128; )
{

d = d + C0 + (C1 & dmask);
i = i + C2;
j = j + (C3 & (~dmask));
dmask = d >> 31;

A[i + j] = C4;

}

C0, C1, C2, C3, C4 are constants. A is a unidimensional array.


Since you've been getting lots of serious replies, I'll take another
look as well:

The most interesting optimization would seem to take most of the logic
out of the loop:

The dmask calculations simply propagate the sign of d into all bit
positions, effectively using it as a branchless replacement for the
following code:

int d = get_d();
int dm;

for(int i = C2, j = 0; i < 128+C2; i += C2 ) {
dm = d; // Remember previous value
d += (dm < 0)? C0 : C0 + C1;
j += (dm >= 0)? C3 : 0;
A[i + j] = C4;
}

We can split this into two parts, one for positive and one for negative
'd' values:

int i = 0, j = 0, ij = 0;
while (i < 128 ) {
while (d < 0) {
ij += C2;
i += C2;
d += C0+C1;
A[ij] = C4;
if (i >= 128) goto done;
}
while (d >= 0) {
ij += C2+C3;
i += C2;
d += C0;
j += C3;
A[ij] = C4;
if (i >= 128) goto done;
}
}
done:

The next step would be to determine on entry how many loop iterations to
run, and how many steps before the sign might/will change:

iterations = 128 / C2; // Static compiler calculation!

while (iterations) {
if (d < 0) {
while (iterations >= 4 && (-d >= (C0+C1)*4)) {
// Assume (C0+C1)*4 won't overflow!
A[ij+C2*1] = C4;
a[ij+C2*2] = C4;
a[ij+C2*3] = C4;
a[ij+C2*4] = C4;
ij += C2*4;
d += (C0+C1)*4;
iterations -= 4;
}
while (iterations && (d < 0)) {
A[ij+C2] = C4;
ij += C2;
d += C0+C1;
iterations--;
}
}
if (d >= 0) {
.... similar code for the positive case...


It should be quite obvious that the speed of shift instructions are
rather unimportant!

Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
Ken Hagan
Guest





Posted: Thu Nov 24, 2005 5:15 pm    Post subject: Re: Software Pipelining Reply with quote

maytersft@yahoo.com wrote:
Quote:
How can the following loop be pipelined?

int d = get_d();
int dmask = d >> 31;

for(int i = 0, j = 0; i < 128; )
{

d = d + C0 + (C1 & dmask);
i = i + C2;
j = j + (C3 & (~dmask));
dmask = d >> 31;

A[i + j] = C4;

}

C0, C1, C2, C3, C4 are constants. A is a unidimensional array.

All operations in this example excepting the shift have
latency/throughput of 1/1.

The shift has latency/throughput of 4/1.

You can choose any-issue architecture(please specify your
assumptions).

Since you haven't constrained the architecture, I'll assume the C
code is supposed to be portable. That being the case, "d>>31" is
undefined behaviour. (In practice, the shift count may be truncated
to 15, so we get the top bit of d propogated through the dmask, or
it may be taken seriously, in which case dmask=0.) That being the
case, I think we can replace the whole loop with...

memset(A, 0, sizeof(A));
Back to top
John Whorfin
Guest





Posted: Fri Nov 25, 2005 9:15 am    Post subject: Re: Software Pipelining Reply with quote

Natarajan Krishnaswami:
Quote:
...deletia

Bravo. Well played sir.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB