hyperthreading gains with overlapping memory-loads
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
hyperthreading gains with overlapping memory-loads
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
Oliver S.
Guest





Posted: Thu Oct 13, 2005 6:38 am    Post subject: hyperthreading gains with overlapping memory-loads Reply with quote

In <434b25a3$0$64081$892e7fe2@authen.white.readfreenews.net> I suspected
that hyperthrading would give an higher-than-average improvement over sin-
gle-threaded performance when there are a lot of inpredictible memory-loads
from both threads. To prove this I've written a little Win32-benchmark that
spawns a single and two threads in two consecutive turns to measure the
throughput of threads doing massive pointer-chasing (a usual way to measure
memory-latency) on pointer-chains with increasing size (going from 1kB to
64MB). I found a number of forum-posts on the web, stating that in this
scenario, hyperthreading would give a significant gain.
As I don't own a P4 with hyperthreading, I'll ask for volunteers to run
this benchmark and to put the results into a FUp here. I put this little
program into a .zip-attachment of this posting. You should run this pro-
gram from a opened console because otherwise it would close the console
after it terminates. It assigns the spawned threads the maximum kernel
thread-priority (31) so that it will run undisturbed by other apps; so
don't wonder you can do nothing, even not move the mouse, while it runs.

For readers which want to compile the app their own, here's the source
(I think it's quite readable because of my phantastic programming style(r)
although it's not documented).


BTW: Strip the non-german newsgroups if you post a FUp in german.



#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

struct DoubleLink
{
DoubleLink *pdlPrev,
*pdlNext;
};

DoubleLink *InitializeDoubleLinkEdList( size_t nDoubleLinks );
void FreeDoubleLinkEdList( DoubleLink *pdlBlock );
DWORD WINAPI PointerChasingThread( LPVOID lpvThreadParam );

struct ThreadInfo
{
unsigned iterations;
DoubleLink *pdlFirst;
LONGLONG llPCTicks;
LONGLONG llClockCycles;
};

#if !defined(NDEBUG)
#define SetPriorityClass __noop
#define SetThreadPriority __noop
#endif

int __cdecl main()
{
SYSTEM_INFO systemInfo;
unsigned processors,
maxThreads;
LONGLONG llPCFrequency;
unsigned threads;

SetPriorityClass( GetCurrentProcess(), REALTIME_PRIORITY_CLASS );
GetSystemInfo( &systemInfo );
processors = systemInfo.dwNumberOfProcessors;
maxThreads = (processors >= 2) ? 2 : 1;
QueryPerformanceFrequency( (LARGE_INTEGER *)&llPCFrequency );

for( threads = 1; threads <= maxThreads; threads *= 2 )
{
printf( "%s%d threads:\n", (threads == 1) ? "" : "\n", (int)threads );

for( size_t blockSize = 1 * 1024;
blockSize <= (size_t)64 * 1024 * 1024;
blockSize *= 2 )
{
size_t links;
unsigned iterations;
unsigned thread;
DoubleLink *apdlBlocks[32];
ThreadInfo ati[32];
HANDLE ahThreads[32];

links = blockSize / sizeof(DoubleLink);
iterations = (unsigned)(((size_t)64 * 1024 * 1024) / blockSize);

for( thread = 0; thread < threads; thread++ )
apdlBlocks[thread] = InitializeDoubleLinkEdList( links ),
ati[thread].iterations = iterations,
ati[thread].pdlFirst = apdlBlocks[0]->pdlNext,
ahThreads[thread] = CreateThread( NULL, 0, PointerChasingThread, &ati[0], CREATE_SUSPENDED, NULL ),
SetThreadPriority( ahThreads[thread], THREAD_PRIORITY_TIME_CRITICAL );

for( thread = 0; thread < threads; thread++ )
ResumeThread( ahThreads[thread] );

WaitForMultipleObjects( threads, ahThreads, TRUE, INFINITE );

for( thread = 0; thread < threads; thread++ )
FreeDoubleLinkEdList( apdlBlocks[thread] );

double timeInPCTicks,
timeInSeconds;
double linksPerSecond,
megabytesPerSecond;
double timeInClockCycles;
double clockCyclesPerLink;

for( timeInPCTicks = 0.0, thread = 0; thread < threads; thread++ )
timeInPCTicks += ati[thread].llPCTicks;

timeInPCTicks /= threads;
timeInSeconds = timeInPCTicks / llPCFrequency;
linksPerSecond = ((double)threads * iterations * links) / timeInSeconds;
megabytesPerSecond = linksPerSecond * sizeof(DoubleLink *) * (1.0 / (1024.0 * 1024.0));

for( timeInClockCycles = 0.0, thread = 0; thread < threads; thread++ )
timeInClockCycles += ati[thread].llClockCycles;

timeInClockCycles /= threads;
clockCyclesPerLink = timeInClockCycles / ((double)threads * iterations * links);

printf( "\tblock-size:% 6i%s",
blockSize < (1024 * 1024) ? (int)(blockSize / 1024) : (int)(blockSize / (1024 * 1024)),
blockSize < (1024 * 1024) ? "kB" : "MB" );
printf( "% 9.2lf MB/s", (double)megabytesPerSecond );
printf( "% 9.2lf cycles/access\n", (double)clockCyclesPerLink );
}
}

return 0;
}

LONGLONG __fastcall FollowDoubleLinkEdList( unsigned iterations, DoubleLink *pdl );

DWORD WINAPI PointerChasingThread( LPVOID lpvThreadParam )
{
ThreadInfo *pti = (ThreadInfo *)lpvThreadParam;
LONGLONG llPCStartTick,
llPCEndTick;

QueryPerformanceCounter( (LARGE_INTEGER *)&llPCStartTick );
pti->llClockCycles = FollowDoubleLinkEdList( pti->iterations, pti->pdlFirst );
QueryPerformanceCounter( (LARGE_INTEGER *)&llPCEndTick );
pti->llPCTicks = llPCEndTick - llPCStartTick;

return 0;
}

DoubleLink *InitializeDoubleLinkEdList( size_t nDoubleLinks )
{
DoubleLink *pdlBlock;
DoubleLink *pdlHead,
*pdlTail,
*pdlFirst;
DoubleLink *pdl,
*pdlEnd,
*pdlPrev;
DoubleLink *pdlXChg,
*pdlNext,
*pdlXChgPrev,
*pdlXChgNext;

pdlBlock = (DoubleLink *)VirtualAlloc( NULL, 4096 + nDoubleLinks * sizeof(DoubleLink), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE );
pdlHead = pdlBlock;
pdlTail = pdlBlock + 1;
pdlFirst = (DoubleLink *)((BYTE *)pdlBlock + 4096);

pdlHead->pdlPrev = (DoubleLink *)NULL;

for( pdl = pdlFirst,
pdlEnd = pdlFirst + nDoubleLinks,
pdlPrev = pdlHead;
pdl < pdlEnd; pdlPrev = pdl++ )
pdlPrev->pdlNext = pdl,
pdl->pdlPrev = pdlPrev;

pdlPrev->pdlNext = pdlTail;
pdlTail->pdlPrev = pdlPrev;
pdlTail->pdlNext = NULL;

for( pdl = pdlFirst; pdl < pdlEnd; )
if( (pdlXChg = pdlFirst + ((((unsigned)rand() << 15) | rand()) % nDoubleLinks)) != pdl &&
(pdlPrev = pdl->pdlPrev) != pdlXChg &&
(pdlNext = pdl->pdlNext) != pdlXChg )
{
pdlXChgPrev = pdlXChg->pdlPrev;
pdlXChgNext = pdlXChg->pdlNext;

pdlPrev->pdlNext = pdlXChg;
pdlXChg->pdlPrev = pdlPrev;
pdlXChg->pdlNext = pdlNext;
pdlNext->pdlPrev = pdlXChg;

pdlXChgPrev->pdlNext = pdl;
pdl->pdlPrev = pdlXChgPrev;
pdl->pdlNext = pdlXChgNext;
pdlXChgNext->pdlPrev = pdl;

pdl++;
}

return pdlBlock;
}

void FreeDoubleLinkEdList( DoubleLink *pdlBlock )
{
VirtualFree( pdlBlock, 0, MEM_RELEASE );
}

__declspec(naked)
LONGLONG __fastcall FollowDoubleLinkEdList( unsigned iterations, DoubleLink *pdl )
{
__asm
{
push esi
push edi
push edx

rdtsc
mov edi, eax
mov esi, edx

jmp checkIterations

nextIteration:
mov edx, [esp]

doubleLinkFollow:
mov edx, [edx + 4]
or edx, edx
jnz doubleLinkFollow

checkIterations:
sub ecx, 1
jnc nextIteration

rdtsc
sub eax, edi
sbb edx, esi

mov edi, [esp + 4]
mov esi, [esp + 8]
add esp, 12
ret
}
}
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 7:01 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

I forgot the attachment and tried to upload it in a FUp, but my
server refused this attachment.posting (ridiculous because its
only 22kb). Fortunately I remembered rapidshare.de and uploaded
it there. Here's the download-link:

http://rapidshare.de/files/6213207/ParallelMemLatency.zip.html
Back to top
Chris Thomasson
Guest





Posted: Thu Oct 13, 2005 7:22 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]I forgot the attachment and tried to upload it in a FUp, but my
server refused this attachment.posting (ridiculous because its
only 22kb). Fortunately I remembered rapidshare.de and uploaded
it there. Here's the download-link:

http://rapidshare.de/files/6213207/ParallelMemLatency.zip.html
[/quote]
It an executable! Why should I run this program? Post some source code.
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]What do you think I did post in the message-body of
434dba71$0$60284$892e7fe2@authen.white.readfreenews.net> ?
[/quote]
As I have some doubts that Chris is able to apply the fix mentioned
in my recent posting, here's the whole changed source-code (again):


#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

struct DoubleLink
{
DoubleLink *pdlPrev,
*pdlNext;
};

DoubleLink *InitializeDoubleLinkEdList( size_t nDoubleLinks );
void FreeDoubleLinkEdList( DoubleLink *pdlBlock );
DWORD WINAPI PointerChasingThread( LPVOID lpvThreadParam );

struct ThreadInfo
{
unsigned iterations;
DoubleLink *pdlFirst;
LONGLONG llPCTicks;
LONGLONG llClockCycles;
};

#if !defined(NDEBUG)
#define SetPriorityClass __noop
#define SetThreadPriority __noop
#endif

int __cdecl main()
{
SYSTEM_INFO systemInfo;
unsigned processors,
maxThreads;
LONGLONG llPCFrequency;
unsigned threads;

SetPriorityClass( GetCurrentProcess(), REALTIME_PRIORITY_CLASS );
GetSystemInfo( &systemInfo );
processors = systemInfo.dwNumberOfProcessors;
maxThreads = (processors >= 2) ? 2 : 1;
QueryPerformanceFrequency( (LARGE_INTEGER *)&llPCFrequency );

for( threads = 1; threads <= maxThreads; threads *= 2 )
{
printf( "%s%d threads:\n", (threads == 1) ? "" : "\n", (int)threads );

for( size_t blockSize = 1 * 1024;
blockSize <= (size_t)64 * 1024 * 1024;
blockSize *= 2 )
{
size_t links;
unsigned iterations;
unsigned thread;
DoubleLink *apdlBlocks[32];
ThreadInfo ati[32];
HANDLE ahThreads[32];

links = blockSize / sizeof(DoubleLink);
iterations = (unsigned)(((size_t)64 * 1024 * 1024) / blockSize);

for( thread = 0; thread < threads; thread++ )
apdlBlocks[thread] = InitializeDoubleLinkEdList( links ),
ati[thread].iterations = iterations,
ati[thread].pdlFirst = apdlBlocks[thread]->pdlNext,
ahThreads[thread] = CreateThread( NULL, 0, PointerChasingThread, &ati[thread], CREATE_SUSPENDED, NULL ),
SetThreadPriority( ahThreads[thread], THREAD_PRIORITY_TIME_CRITICAL );

for( thread = 0; thread < threads; thread++ )
ResumeThread( ahThreads[thread] );

WaitForMultipleObjects( threads, ahThreads, TRUE, INFINITE );

for( thread = 0; thread < threads; thread++ )
FreeDoubleLinkEdList( apdlBlocks[thread] );

double timeInPCTicks,
timeInSeconds;
double linksPerSecond,
megabytesPerSecond;
double timeInClockCycles;
double clockCyclesPerLink;

for( timeInPCTicks = 0.0, thread = 0; thread < threads; thread++ )
timeInPCTicks += ati[thread].llPCTicks;

timeInPCTicks /= threads;
timeInSeconds = timeInPCTicks / llPCFrequency;
linksPerSecond = ((double)threads * iterations * links) / timeInSeconds;
megabytesPerSecond = linksPerSecond * sizeof(DoubleLink *) * (1.0 / (1024.0 * 1024.0));

for( timeInClockCycles = 0.0, thread = 0; thread < threads; thread++ )
timeInClockCycles += ati[thread].llClockCycles;

timeInClockCycles /= threads;
clockCyclesPerLink = timeInClockCycles / ((double)threads * iterations * links);

printf( "\tblock-size:% 6i%s",
blockSize < (1024 * 1024) ? (int)(blockSize / 1024) : (int)(blockSize / (1024 * 1024)),
blockSize < (1024 * 1024) ? "kB" : "MB" );
printf( "% 9.2lf MB/s", (double)megabytesPerSecond );
printf( "% 9.2lf cycles/access\n", (double)clockCyclesPerLink );
}
}

return 0;
}

LONGLONG __fastcall FollowDoubleLinkEdList( unsigned iterations, DoubleLink *pdl );

DWORD WINAPI PointerChasingThread( LPVOID lpvThreadParam )
{
ThreadInfo *pti = (ThreadInfo *)lpvThreadParam;
LONGLONG llPCStartTick,
llPCEndTick;

QueryPerformanceCounter( (LARGE_INTEGER *)&llPCStartTick );
pti->llClockCycles = FollowDoubleLinkEdList( pti->iterations, pti->pdlFirst );
QueryPerformanceCounter( (LARGE_INTEGER *)&llPCEndTick );
pti->llPCTicks = llPCEndTick - llPCStartTick;

return 0;
}

DoubleLink *InitializeDoubleLinkEdList( size_t nDoubleLinks )
{
DoubleLink *pdlBlock;
DoubleLink *pdlHead,
*pdlTail,
*pdlFirst;
DoubleLink *pdl,
*pdlEnd,
*pdlPrev;
DoubleLink *pdlXChg,
*pdlNext,
*pdlXChgPrev,
*pdlXChgNext;

pdlBlock = (DoubleLink *)VirtualAlloc( NULL, 4096 + nDoubleLinks * sizeof(DoubleLink), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE );
pdlHead = pdlBlock;
pdlTail = pdlBlock + 1;
pdlFirst = (DoubleLink *)((BYTE *)pdlBlock + 4096);

pdlHead->pdlPrev = (DoubleLink *)NULL;

for( pdl = pdlFirst,
pdlEnd = pdlFirst + nDoubleLinks,
pdlPrev = pdlHead;
pdl < pdlEnd; pdlPrev = pdl++ )
pdlPrev->pdlNext = pdl,
pdl->pdlPrev = pdlPrev;

pdlPrev->pdlNext = pdlTail;
pdlTail->pdlPrev = pdlPrev;
pdlTail->pdlNext = NULL;

for( pdl = pdlFirst; pdl < pdlEnd; )
if( (pdlXChg = pdlFirst + ((((unsigned)rand() << 15) | rand()) % nDoubleLinks)) != pdl &&
(pdlPrev = pdl->pdlPrev) != pdlXChg &&
(pdlNext = pdl->pdlNext) != pdlXChg )
{
pdlXChgPrev = pdlXChg->pdlPrev;
pdlXChgNext = pdlXChg->pdlNext;

pdlPrev->pdlNext = pdlXChg;
pdlXChg->pdlPrev = pdlPrev;
pdlXChg->pdlNext = pdlNext;
pdlNext->pdlPrev = pdlXChg;

pdlXChgPrev->pdlNext = pdl;
pdl->pdlPrev = pdlXChgPrev;
pdl->pdlNext = pdlXChgNext;
pdlXChgNext->pdlPrev = pdl;

pdl++;
}

return pdlBlock;
}

void FreeDoubleLinkEdList( DoubleLink *pdlBlock )
{
VirtualFree( pdlBlock, 0, MEM_RELEASE );
}

__declspec(naked)
LONGLONG __fastcall FollowDoubleLinkEdList( unsigned iterations, DoubleLink *pdl )
{
__asm
{
push esi
push edi
push edx

rdtsc
mov edi, eax
mov esi, edx

jmp checkIterations

nextIteration:
mov edx, [esp]

doubleLinkFollow:
mov edx, [edx + 4]
or edx, edx
jnz doubleLinkFollow

checkIterations:
sub ecx, 1
jnc nextIteration

rdtsc
sub eax, edi
sbb edx, esi

mov edi, [esp + 4]
mov esi, [esp + 8]
add esp, 12
ret
}
}
Back to top
Branimir Maksimovic
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

Oliver S. wrote:
[quote]ahThreads[thread] = CreateThread( NULL, 0, PointerChasingThread, &ati[0], CREATE_SUSPENDED, NULL ),

Oh, there's the bug! The code should be:
ahThreads[thread] = CreateThread( NULL, 0, PointerChasingThread, &ati[thread], CREATE_SUSPENDED, NULL ),
That's because I've first designed a single-threaded version and then transformed the code into a MT-version.

New version under:
http://rapidshare.de/files/6215634/ParallelMemLatency.zip.html
[/quote]
Now it's working:

1 threads:
block-size: 1kB 3470.05 MB/s 3.30 cycles/access
block-size: 2kB 3600.08 MB/s 3.18 cycles/access
block-size: 4kB 3695.58 MB/s 3.09 cycles/access
block-size: 8kB 3296.30 MB/s 3.47 cycles/access
block-size: 16kB 619.60 MB/s 18.46 cycles/access
block-size: 32kB 428.23 MB/s 26.71 cycles/access
block-size: 64kB 385.05 MB/s 29.71 cycles/access
block-size: 128kB 358.93 MB/s 31.87 cycles/access
block-size: 256kB 347.18 MB/s 32.95 cycles/access
block-size: 512kB 218.57 MB/s 52.33 cycles/access
block-size: 1MB 60.62 MB/s 188.69 cycles/access
block-size: 2MB 46.57 MB/s 245.64 cycles/access
block-size: 4MB 40.16 MB/s 284.86 cycles/access
block-size: 8MB 38.62 MB/s 296.15 cycles/access
block-size: 16MB 37.44 MB/s 305.54 cycles/access
block-size: 32MB 36.43 MB/s 314.00 cycles/access
block-size: 64MB 35.30 MB/s 324.07 cycles/access

2 threads:
block-size: 1kB 5498.13 MB/s 2.08 cycles/access
block-size: 2kB 6140.41 MB/s 1.86 cycles/access
block-size: 4kB 6816.34 MB/s 1.68 cycles/access
block-size: 8kB 6862.11 MB/s 1.67 cycles/access
block-size: 16kB 1045.43 MB/s 10.94 cycles/access
block-size: 32kB 811.57 MB/s 14.09 cycles/access
block-size: 64kB 678.05 MB/s 16.87 cycles/access
block-size: 128kB 662.31 MB/s 17.27 cycles/access
block-size: 256kB 421.85 MB/s 27.11 cycles/access
block-size: 512kB 392.16 MB/s 29.17 cycles/access
block-size: 1MB 100.49 MB/s 113.83 cycles/access
block-size: 2MB 73.74 MB/s 155.12 cycles/access
block-size: 4MB 67.23 MB/s 170.13 cycles/access
block-size: 8MB 63.76 MB/s 179.40 cycles/access
block-size: 16MB 62.58 MB/s 182.80 cycles/access
block-size: 32MB 58.61 MB/s 195.16 cycles/access
block-size: 64MB 57.74 MB/s 198.11 cycles/access


Greetings, Bane.
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]... Why should I run this program?
[/quote]
There are others which don't aks themselfes this question.

[quote]Post some source code.
[/quote]
What do you think I did post in the message-body of <434dba71$0$60284$892e7fe2@authen.white.readfreenews.net> ?
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

Use the version fixed like described in <434de059$0$35652$892e7fe2@authen.white.readfreenews.net>:
http://rapidshare.de/files/6215634/ParallelMemLatency.zip.html
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]ahThreads[thread] = CreateThread( NULL, 0, PointerChasingThread, &ati[0], CREATE_SUSPENDED, NULL ),
[/quote]
Oh, there's the bug! The code should be:
ahThreads[thread] = CreateThread( NULL, 0, PointerChasingThread, &ati[thread], CREATE_SUSPENDED, NULL ),
That's because I've first designed a single-threaded version and then transformed the code into a MT-version.

New version under:
http://rapidshare.de/files/6215634/ParallelMemLatency.zip.html
Back to top
Branimir Maksimovic
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

Oliver S. wrote:
[quote]I forgot the attachment and tried to upload it in a FUp, but my
server refused this attachment.posting (ridiculous because its
only 22kb). Fortunately I remembered rapidshare.de and uploaded
it there. Here's the download-link:

http://rapidshare.de/files/6213207/ParallelMemLatency.zip.html
[/quote]
does not work:
1 threads:
block-size: 1kB 3466.24 MB/s 3.30 cycles/access
block-size: 2kB 3628.49 MB/s 3.15 cycles/access
block-size: 4kB 3718.49 MB/s 3.08 cycles/access
block-size: 8kB 2783.41 MB/s 4.11 cycles/access
block-size: 16kB 646.52 MB/s 17.69 cycles/access
block-size: 32kB 429.76 MB/s 26.62 cycles/access
block-size: 64kB 389.76 MB/s 29.35 cycles/access
block-size: 128kB 366.34 MB/s 31.22 cycles/access
block-size: 256kB 355.00 MB/s 32.22 cycles/access
block-size: 512kB 222.92 MB/s 51.31 cycles/access
block-size: 1MB 61.96 MB/s 184.61 cycles/access
block-size: 2MB 46.63 MB/s 245.32 cycles/access
block-size: 4MB 40.84 MB/s 280.05 cycles/access
block-size: 8MB 38.65 MB/s 295.99 cycles/access
block-size: 16MB 37.41 MB/s 305.74 cycles/access
block-size: 32MB 36.04 MB/s 317.42 cycles/access
block-size: 64MB 35.30 MB/s 324.02 cycles/access

2 threads:
block-size: 1kB -0.00 MB/s 159260740.90 cycles/access
block-size: 2kB -0.00 MB/s 159260740.72 cycles/access
block-size: 4kB -0.00 MB/s 159260740.60 cycles/access
block-size: 8kB -0.00 MB/s 159260740.64 cycles/access
block-size: 16kB -0.00 MB/s 159260745.00 cycles/access
block-size: 32kB -0.00 MB/s 159260746.88 cycles/access
block-size: 64kB -0.00 MB/s 159260748.22 cycles/access
block-size: 128kB -0.00 MB/s 159260748.41 cycles/access
block-size: 256kB -0.00 MB/s 159260753.34 cycles/access
block-size: 512kB -0.00 MB/s 159260754.54 cycles/access
block-size: 1MB -0.00 MB/s 159260798.37 cycles/access
block-size: 2MB -0.00 MB/s 159260816.54 cycles/access
block-size: 4MB -0.00 MB/s 159260824.65 cycles/access
block-size: 8MB -0.00 MB/s 159260829.22 cycles/access
block-size: 16MB -0.00 MB/s 159260833.84 cycles/access
block-size: 32MB -0.00 MB/s 159260835.03 cycles/access
block-size: 64MB -0.00 MB/s 159260838.41 cycles/access

Greetings, Bane.
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

Ok, I'll check my program this day and re-post a fixed version this evening.
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]Ummm, don't worry Oliver, I can understand your code!
[/quote]
Of course you can! That's just because it's modelled so exemplary.
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

I put together the data you postet in a spreadsheet and
calculated the speedup-percentage for each block-size:

blksize no HT with HT speedup

1kB 3.470,05 5.498,13 58,45%
2kB 3.600,08 6.140,41 70,56%
4kB 3.695,58 6.816,34 84,45%
8kB 3.296,30 6.862,11 108,18%
16kB 619,60 1.045,43 68,73%
32kB 428,23 811,57 89,52%
64kB 385,05 678,05 76,09%
128kB 358,93 662,31 84,52%
256kB 347,18 421,85 21,51%
512kB 218,57 392,16 79,42%
1MB 60,62 100,49 65,77%
2MB 46,57 73,74 58,34%
4MB 40,16 67,23 67,41%
8MB 38,62 63,76 65,10%
16MB 37,44 62,58 67,15%
32MB 36,43 58,61 60,88%
64MB 35,30 57,74 63,57%

average speedup: 66,09%

An average speedup of 66% - that's really amazing!
Back to top
Chris Thomasson
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]... Why should I run this program?

There are others which don't aks themselfes this question.
[/quote]
:/




[quote]Post some source code.

What do you think I did post in the message-body of
434dba71$0$60284$892e7fe2@authen.white.readfreenews.net> ?
[/quote]
DOH! I need coffee!!! Sorry.
Back to top
Oliver S.
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]What do you think I did post in the message-body of
434dba71$0$60284$892e7fe2@authen.white.readfreenews.net> ?

DOH! I need coffee!!! Sorry.
[/quote]
I think you need sleep (and me as well).
Back to top
Chris Thomasson
Guest





Posted: Thu Oct 13, 2005 8:15 am    Post subject: Re: hyperthreading gains with overlapping memory-loads Reply with quote

[quote]As I have some doubts that Chris is able to apply the fix mentioned
in my recent posting:
[/quote]
Ummm, don't worry Oliver, I can understand your code!




Humm... Did you get beat up a lot as a kid?


http://groups.google.com/group/comp.programming.threads/msg/a046c2e77cc5b7ed?hl=en
(Oliver flames me for providing some source code to a person interested in
lock-free programming)

BTW... The thread has some good information in it:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/96f280d49a63bb9f/5ea2bf37cc1c73f1?tvc=1#5ea2bf37cc1c73f1




http://groups.google.com/group/comp.programming.threads/msg/12b03abe2eb0d496?hl=en
(Oliver flames me for noting that application design has a lot to do with
SMT performance)




http://groups.google.com/group/comp.arch/msg/2f40a2aaeb8cacb6?hl=en
(Oliver flames another, what he calls, "idiot")




You seem to need, an attitude adjustment. Now I think I know why you
frequently use X-No-Archive: Yes.

:)
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB