| Author |
Message |
Anne & Lynn Wheeler
Guest
|
Posted:
Wed Jul 27, 2005 12:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
"Stephen Fuld" <s.fuld@PleaseRemove.att.net> writes:
| Quote: | Is loosly coupled essentially a cluster? I thought that there was a
distinction in that loosly coupled meanst shared DASD (disk to non-IBMers)
wheras a cluster (in today's parlance) typically meant totally independent
systems but with some I/O type interconnect. But not direct access to a
common disk pool without going through another CPU. But I may be wrong in
my terminology.
|
in the 60s they were mostly in the same data center with connectivity
to common i/o pool ... especially in availability configurations (and
because dasd/disk price/bit was quite expensive).. later availability
configurations over geographic distances became replicated/mirrored
data.
in any case, some amount of the driving factors for common i/o pool
was significant dasd/disk costs. at various points the disk business
unit pulled in more revenue than the processor/memory business unit.
some of the 60s scenarios may have made common i/o pool for
availability easier ... since 360 I/O channels had 200ft runs ... you
could place processor clusters in the center and then have 200ft
radius connectivity. this was increased with data streaming in the 70s
to 400ft runs (allowing 400ft radius) .... although some larger
installations found even this a limitation ... so there were some
datacenters that spread in 3d over multiple floors/stories.
possibly the first SAN was at ncar. disk/dasd pool managed by ibm
mainframe ... but also hyperchannel A515 adapters providing ibm
channel emulation access to other processors (having connection in the
hyperchannel environment). various processors in the complex (crays,
other processors) would communicate to ibm mainframe (control
channel). ibm mainframe would setup i/o transfer commands in the A515
.... and return a handle for the (a515) i/o commands to the requesting
processer. The requesting client (cray supercomputer) would then
invoke the A515 i/o commands for direct disk/dasd data transfer (using
the same i/o interconnect layer for separate control with ibm
mainframe and direct disk data transfer).
One of the reason for 3party transfer specification in HiPPI switch
specification ... was to be able to emulate the ncar hyperchannel
environemnt.
--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |
|
| Back to top |
|
 |
Anne & Lynn Wheeler
Guest
|
Posted:
Wed Jul 27, 2005 12:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
| Quote: | For programs where the runtime is measured in days it is a pretty good
bet that most of the time there isn't someone there waiting for the
next prompt.
It is nice to be notified when it ends, though.
|
wasn't exactly what i had in mind, ... there are a lot of (mega)
"on-line" applications run in batch environments ... because the batch
environments have evolved a lot of "automated" conventions for
handling numerous types of events (rather than pushing them to the end
user, common in interactive system). these on-line environments tend
to make use of these automated facilities to help provide 7x24,
continuous operation.
several years ago, we were talking to one of the major financial
transaction systems ... which commented that they attributed their one
hundred percent availability over the previous several years primarily
to
1) ims hot-standby
2) automated operator
when my wife did her stint in pok (batch mainframe land) responsible
for loosely-coupled (i.e. cluster by any other name) architecture ...
she came up with peer-coupled shared data architecture
http://www.garlic.com/~lynn/subtopic.html#shareddata
the first organization that really used it was ims group for ims
hot-standby.
batch systems tended to have some residual direct human involvement,
in the early days for tending printers, card readers, tape drives, etc
(i.e. called operators).
during the early 70s, i started developing automated processes for
performing many of the tasks that the operating system nominally
required of operators.
starting in the early 80s ... you started to see the shift from
hardware being the primary source of failures to software and people
being the primary source of failures. automated operator went a long
way to reducing many of the human mistake related failures.
--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |
|
| Back to top |
|
 |
Kai Harrekilde-Petersen
Guest
|
Posted:
Wed Jul 27, 2005 12:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
(snips)
| Quote: | One application that people wish would run better on SMP machines is
routing for FPGA or ASIC designs. As far as I understand it, it is
very difficult to do SMP, though maybe some can do two processors.
|
The Magma P&R tools can definitely use more than one CPU. Likewise,
the LVS tools can. I don't recall if Synopsys DC can utilize multiple
CPUs. However, that is less interesting (at least to us), since we
cut synthesis jobs into managable chunks so we just run them in
parallel with make (and a bunch of licenses).
Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk> |
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Wed Jul 27, 2005 12:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
"Anne & Lynn Wheeler" <lynn@garlic.com> wrote in message
news:m364uxcoy3.fsf@lhwlinux.garlic.com...
snip
| Quote: | when my wife did her stint in pok (batch mainframe land) responsible
for loosely-coupled (i.e. cluster by any other name) architecture ...
|
Is loosly coupled essentially a cluster? I thought that there was a
distinction in that loosly coupled meanst shared DASD (disk to non-IBMers)
wheras a cluster (in today's parlance) typically meant totally independent
systems but with some I/O type interconnect. But not direct access to a
common disk pool without going through another CPU. But I may be wrong in
my terminology.
snip
| Quote: | batch systems tended to have some residual direct human involvement,
in the early days for tending printers, card readers, tape drives, etc
(i.e. called operators).
|
We called them PEOs (Peripheral Equipment Operators) (also, but not to their
faces, called peons). This was to distingusih them from operators who did
things at the console and got paid more. These were actually different
jobs, with different job descriptions, etc. But this was the Federal
government who cared about such things, and a site with lots of tapes (the
"library" was about 600,000 reels in the late 1970s) and justified a
different job as they were kept quite busy with lots of tape mounts.
Sometimes, the people who mounted tapes were called Tape Hangers.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Tom Linden
Guest
|
Posted:
Wed Jul 27, 2005 6:40 am Post subject:
Re: Cluster computing drawbacks |
|
|
On Tue, 26 Jul 2005 21:50:38 GMT, Stephen Fuld
<s.fuld@PleaseRemove.att.net> wrote:
| Quote: | Is loosly coupled essentially a cluster? I thought that there was a
distinction in that loosly coupled meanst shared DASD (disk to
non-IBMers)
wheras a cluster (in today's parlance) typically meant totally
independent
systems but with some I/O type interconnect. But not direct access to a
common disk pool without going through another CPU. But I may be wrong
in
my terminology.
|
Well, nomenclature does undergo changes, but in VMS clusters there is
really
no difference from a locally attached disk to one on another node.
Moreover,
in my cluster we have mirrored disks on a common scsi channel attached to
three
nodes, which are mounted on the cluster, meaning that as the nodes are
booted
each node successively mounts them. If you issue the equivalent of the
Unix
df command from each node, that node thinks it owns the drives |
|
| Back to top |
|
 |
Anne & Lynn Wheeler
Guest
|
Posted:
Wed Jul 27, 2005 8:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
"Stephen Fuld" <s.fuld@PleaseRemove.att.net> writes:
| Quote: | OK, that sounds similar to the old IBM loosely coupled scheme where
multiple computers each had a channel to a disk controller. That
is, there is a direct path from each CPU to a disk without going
through another CPU. Contrast that with say a Beowulf cluster, or
for that matter, any cluster of commodity PCs using an interconnect
fabric of some sort. That is the distinction I was thinking about.
|
we had come up with geographic survivability when we were doing
(non-mainframe) ha/cmp
http://www.garlic.com/~lynn/subtopic.html#hacmp
however the mainframe culmination of my wife's peer-coupled shared
data architecture (when she did her stint in POK in charge of
mainframe loosely-coupled architecture) is current mainframe parallel
sysplex
http://www-1.ibm.com/servers/eserver/zseries/pso/
and this is geographic dispersed parallel sysplex:
http://www-1.ibm.com/servers/eserver/zseries/gdps/
it mentions continuous availability ... when we were doing ha/cmp, we
got asked to author part of the corporate continuous availability
strategy document ... however, both pok and rochester complained
.... that our geographic survivability statements couldn't be met by
them (at the time).
http://www.garlic.com/~lynn/subtopic.html#available
note that FCS would provide both interprocessor and device
connectivity using the same fabric. some of the upcoming disk
assemblies will be able to run disk data transfers over ethernet
(again both interprocessor and device connectivity using the same
fabric).
--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |
|
| Back to top |
|
 |
glen herrmannsfeldt
Guest
|
Posted:
Wed Jul 27, 2005 8:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
Anne & Lynn Wheeler wrote:
(snip)
| Quote: | wasn't exactly what i had in mind, ... there are a lot of (mega)
"on-line" applications run in batch environments ... because the batch
environments have evolved a lot of "automated" conventions for
handling numerous types of events (rather than pushing them to the end
user, common in interactive system). these on-line environments tend
to make use of these automated facilities to help provide 7x24,
continuous operation.
|
IBM at least used to be interested in scientific computing.
They did build the 360/91 and the vector instructions for S/370.
But yes, the commercial side needs the high uptime.
-- glen |
|
| Back to top |
|
 |
Anne & Lynn Wheeler
Guest
|
|
| Back to top |
|
 |
Stephen Fuld
Guest
|
Posted:
Wed Jul 27, 2005 8:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
"Tom Linden" <tom@kednos.com> wrote in message
news:opsujtlhwezgicya@hyrrokkin...
| Quote: | On Tue, 26 Jul 2005 21:50:38 GMT, Stephen Fuld
s.fuld@PleaseRemove.att.net> wrote:
Is loosly coupled essentially a cluster? I thought that there was a
distinction in that loosly coupled meanst shared DASD (disk to
non-IBMers)
wheras a cluster (in today's parlance) typically meant totally
independent
systems but with some I/O type interconnect. But not direct access to a
common disk pool without going through another CPU. But I may be wrong
in
my terminology.
Well, nomenclature does undergo changes, but in VMS clusters there is
really
no difference from a locally attached disk to one on another node.
Moreover,
in my cluster we have mirrored disks on a common scsi channel attached to
three
nodes, which are mounted on the cluster, meaning that as the nodes are
booted
each node successively mounts them. If you issue the equivalent of the
Unix
df command from each node, that node thinks it owns the drives
|
OK, that sounds similar to the old IBM loosely coupled scheme where multiple
computers each had a channel to a disk controller. That is, there is a
direct path from each CPU to a disk without going through another CPU.
Contrast that with say a Beowulf cluster, or for that matter, any cluster of
commodity PCs using an interconnect fabric of some sort. That is the
distinction I was thinking about.
--
- Stephen Fuld
e-mail address disguised to prevent spam |
|
| Back to top |
|
 |
Ketil Malde
Guest
|
Posted:
Wed Jul 27, 2005 1:16 pm Post subject:
Re: Cluster computing drawbacks |
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
| Quote: | Nice example. It needs pretty much no communication
|
Yes, perhaps that is the problem: it isn't "interesting".
Parallelizing it doesn't pose any challenges, it doesn't (at least not
in most cases) require million-dollar top-500 machines, in fact, it
doesn't even require a fast interconnect. It's just an application
that some users would like to run with a minimum of hassle (and who
gives a damn about *them*?)
It's fairly typical, though.
-k
--
If I haven't seen further, it is by standing in the footprints of giants |
|
| Back to top |
|
 |
Javier Fernández
Guest
|
Posted:
Wed Jul 27, 2005 3:11 pm Post subject:
Re: Cluster computing drawbacks |
|
|
Nick Maclaren wrote:
| Quote: | scheduler or both. Just Do It. Converting to use MPI communication
is harder, but still easier than converting to use SMP communication.
Most people's experience is that it is EASIER than converting
a serial program to use SMP communication. Seriously. Converting
to use SMP is one of the foulest tasks that you can imagine, and is
|
"This software is not thread-safe" :-)
I made the same claim on my Ph.D. defense. You can find books and books
with chapters and chapters devoted to _explain_ the possible deadlocks
on SMPs and then chapters and chapters devoted to explain The Right Way
to build several paradigms (consumer-producer, etc)
Change or delete (or add) one line to such schemes and you'll face
a deadlock, for reasons so complex to explain that you'll need to
study again all those previous chapters :-)
Message-passing handbooks usually include a remark on the man page
for _send (or _recv), clarifying that if you put the wrong tag or
receiver (sender), your message will get lost, sent to a non-listening
receiver or received from a non-expected sender. Three text lines,
instead of chapters and chapters.
Ok, this is not a proof, but I look for "thread safe" in google:
681.000 results, and then for "message-passing" blocked (or block)
and I get 48.600 (or 198.000) results. I would say that writing a
thread safe application requires more instruction that a non-blocking
cluster parallel application.
Nice to learn somebody else thinks the same :-)
-javier |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Wed Jul 27, 2005 4:15 pm Post subject:
Re: Cluster computing drawbacks |
|
|
In article <dc7mjh$qqc$1@mercurio.cica.es>,
=?ISO-8859-1?Q?Javier_Fern=E1ndez?= <javier@atc.ugr.es> writes:
|> Nick Maclaren wrote:
|> > scheduler or both. Just Do It. Converting to use MPI communication
|> > is harder, but still easier than converting to use SMP communication.
|> >
|> > Most people's experience is that it is EASIER than converting
|> > a serial program to use SMP communication. Seriously. Converting
|> > to use SMP is one of the foulest tasks that you can imagine, and is
|>
|> "This software is not thread-safe" :-)
|>
|> I made the same claim on my Ph.D. defense. You can find books and books
|> with chapters and chapters devoted to _explain_ the possible deadlocks
|> on SMPs and then chapters and chapters devoted to explain The Right Way
|> to build several paradigms (consumer-producer, etc)
To be fair, about half of those also apply to message passing.
What you don't get with message passing is IMPLICIT interaction;
if you don't pass a message, the threads are independent. With
shared memory, it is usually unclear when threads are interacting.
|> Change or delete (or add) one line to such schemes and you'll face
|> a deadlock, for reasons so complex to explain that you'll need to
|> study again all those previous chapters :-)
Actually, my experience is that people often don't get that far.
The issues to do with when objects are distinct and when they are
not (i.e. when they may be used independently in separate threads)
and the total lack of tools for investigating even the simpler
issues of wrong answers, deadlock etc. are what catch them.
|> Message-passing handbooks usually include a remark on the man page
|> for _send (or _recv), clarifying that if you put the wrong tag or
|> receiver (sender), your message will get lost, sent to a non-listening
|> receiver or received from a non-expected sender. Three text lines,
|> instead of chapters and chapters.
There are some tools to help check for that. There are also a
few situations where you can get deadlock that are not obvious,
but not all that many, and it is pretty easy to diagnose the
erroneous operations when you have done it.
|> Nice to learn somebody else thinks the same :-)
For me too :-)
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
Greg Lindahl
Guest
|
Posted:
Wed Jul 27, 2005 4:15 pm Post subject:
Re: Cluster computing drawbacks |
|
|
In article <87oe8ovf3r.fsf@sefirot.ii.uib.no>,
Ketil Malde <ketil+news@ii.uib.no> wrote:
| Quote: | Parallelizing it doesn't pose any challenges, it doesn't (at least not
in most cases) require million-dollar top-500 machines, in fact, it
doesn't even require a fast interconnect.
|
There are plenty of million-dollar loosely-coupled clusters on the
top-500 list that run embarrassingly parallel apps.
-- greg |
|
| Back to top |
|
 |
Randy
Guest
|
Posted:
Wed Jul 27, 2005 11:34 pm Post subject:
Re: Cluster computing drawbacks |
|
|
Greg Lindahl wrote:
| Quote: | In article <87oe8ovf3r.fsf@sefirot.ii.uib.no>,
Ketil Malde <ketil+news@ii.uib.no> wrote:
Parallelizing it doesn't pose any challenges, it doesn't (at least not
in most cases) require million-dollar top-500 machines, in fact, it
doesn't even require a fast interconnect.
There are plenty of million-dollar loosely-coupled clusters on the
top-500 list that run embarrassingly parallel apps.
-- greg
|
Sure. It's the *other* apps that they run embarassingly badly. ;-}
Randy
--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu |
|
| Back to top |
|
 |
Nick Maclaren
Guest
|
Posted:
Thu Jul 28, 2005 12:15 am Post subject:
Re: Cluster computing drawbacks |
|
|
In article <dc8k4g$bic$4@joe.rice.edu>, Randy <joe@burgershack.com> wrote:
| Quote: | Greg Lindahl wrote:
In article <87oe8ovf3r.fsf@sefirot.ii.uib.no>,
Ketil Malde <ketil+news@ii.uib.no> wrote:
Parallelizing it doesn't pose any challenges, it doesn't (at least not
in most cases) require million-dollar top-500 machines, in fact, it
doesn't even require a fast interconnect.
There are plenty of million-dollar loosely-coupled clusters on the
top-500 list that run embarrassingly parallel apps.
Sure. It's the *other* apps that they run embarassingly badly. ;-}
|
Yes. A long time back, I got flamed for pointing out that postcards
were a perfectly good form of communication for the MOST embarrassingly
parallel applications, and had actually been used for the purpose!
'Tis true, sir ....
There is a gradation of requirements from there right up to the ones
that scale only if the interconnect latency is comparable to the
local memory latency.
Regards,
Nick Maclaren. |
|
| Back to top |
|
 |
|
|
|
|