| Author |
Message |
JB Orca
Guest
|
Posted:
Mon Nov 22, 2004 11:03 am Post subject:
Storage across multiple servers? |
|
|
I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.
Does this make sense to anyone?
To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.
The above idea was given to me as a 'cheaper' solution.
I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.
Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.
And...for the record, this was on Linux and possibly a *bsd.
Thanks!
-- JB |
|
| Back to top |
|
 |
Yura Pismerov
Guest
|
Posted:
Mon Nov 22, 2004 7:57 pm Post subject:
Re: Storage across multiple servers? |
|
|
Look for Google File System.
JB Orca wrote:
| Quote: | I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.
Does this make sense to anyone?
To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.
The above idea was given to me as a 'cheaper' solution.
I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.
Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.
And...for the record, this was on Linux and possibly a *bsd.
Thanks!
-- JB |
|
|
| Back to top |
|
 |
JB Orca
Guest
|
Posted:
Mon Nov 22, 2004 9:11 pm Post subject:
Re: Storage across multiple servers? |
|
|
Yes, I have seen that, however, the Google File System is not something
someone other than Google can use, correct?
I am looking for something that could be used by anyone.
Many thanks.
On 2004-11-22 09:57:26 -0500, Yura Pismerov <ypismerov@tucows.com> said:
| Quote: |
Look for Google File System.
JB Orca wrote:
I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.
Does this make sense to anyone?
To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.
The above idea was given to me as a 'cheaper' solution.
I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.
Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.
And...for the record, this was on Linux and possibly a *bsd.
Thanks!
-- JB |
|
|
| Back to top |
|
 |
Faeandar
Guest
|
Posted:
Tue Nov 23, 2004 3:54 am Post subject:
Re: Storage across multiple servers? |
|
|
On 21 Nov 2004 22:03:53 -0800, jborca@gmail.com (JB Orca) wrote:
| Quote: | I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.
Does this make sense to anyone?
To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.
The above idea was given to me as a 'cheaper' solution.
I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.
Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.
And...for the record, this was on Linux and possibly a *bsd.
Thanks!
-- JB
|
Look at Polyserve, Ibrix, GFS, GPFS, etc. These are all software
solutions for a High Performance File System. How the data is
accessed is up to you, either directly from the node servers or
re-shared as NFS.
A couple of hardware based solutions in this space are Panasas and
Acopia Networks.
~F |
|
| Back to top |
|
 |
JB Orca
Guest
|
Posted:
Tue Nov 23, 2004 5:54 am Post subject:
Re: Storage across multiple servers? |
|
|
On 2004-11-22 17:54:33 -0500, Faeandar <mr_castalot@yahoo.com> said:
| Quote: | On 21 Nov 2004 22:03:53 -0800, jborca@gmail.com (JB Orca) wrote:
I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.
Does this make sense to anyone?
To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.
The above idea was given to me as a 'cheaper' solution.
I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.
Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.
And...for the record, this was on Linux and possibly a *bsd.
Thanks!
-- JB
Look at Polyserve, Ibrix, GFS, GPFS, etc. These are all software
solutions for a High Performance File System. How the data is
accessed is up to you, either directly from the node servers or
re-shared as NFS.
A couple of hardware based solutions in this space are Panasas and
Acopia Networks.
~F
|
Excellent. Thanks much.
Can I assume that AFS would also fit in here as well?
Thanks! |
|
| Back to top |
|
 |
Faeandar
Guest
|
Posted:
Tue Nov 23, 2004 6:43 am Post subject:
Re: Storage across multiple servers? |
|
|
On Mon, 22 Nov 2004 19:54:42 -0500, JB Orca <jborca@gmail.com> wrote:
| Quote: | On 2004-11-22 17:54:33 -0500, Faeandar <mr_castalot@yahoo.com> said:
On 21 Nov 2004 22:03:53 -0800, jborca@gmail.com (JB Orca) wrote:
I am trying to find out how to do something that does not make much
sense to me. I have spoken to a few people who say they have taken
multiple servers (1u in this case) and striped the drives on those 1u
servers so that all servers in the group were seeing all data on all
servers.
Does this make sense to anyone?
To put it in context, this came up during a conversation over the
merits of NAS(nfs) vs SAN.
The above idea was given to me as a 'cheaper' solution.
I'm not really sure the 'cheaper' solution is what I want, but I was
intruqued about how this is possible.
Does anyone know how to do this or what this is called? It would seem
it might be a 'cluster file system' but I see nothing like that when
performing the usual Google searches.
And...for the record, this was on Linux and possibly a *bsd.
Thanks!
-- JB
Look at Polyserve, Ibrix, GFS, GPFS, etc. These are all software
solutions for a High Performance File System. How the data is
accessed is up to you, either directly from the node servers or
re-shared as NFS.
A couple of hardware based solutions in this space are Panasas and
Acopia Networks.
~F
Excellent. Thanks much.
Can I assume that AFS would also fit in here as well?
Thanks!
|
No. AFS is a read-many replica file system but still only has a single
writeable volume, and that resides on a single host. It's a great
file system for traditional file IO and program reads in that it
allows client side caching, multi-location replicas (for network
performance), and transparent read failover to another replica in the
event the current volume is unavailable. But it is in no way a
performance file system like the ones I mentioned.
Also, AFS in non-trivial and generally requires an almost-dedicated if
not completely dedicated admin.
~F |
|
| Back to top |
|
 |
JB Orca
Guest
|
Posted:
Tue Nov 23, 2004 10:09 pm Post subject:
Re: Storage across multiple servers? |
|
|
On 2004-11-22 20:43:52 -0500, Faeandar <mr_castalot@yahoo.com> said:
......snip....
| Quote: |
No. AFS is a read-many replica file system but still only has a single
writeable volume, and that resides on a single host. It's a great
file system for traditional file IO and program reads in that it
allows client side caching, multi-location replicas (for network
performance), and transparent read failover to another replica in the
event the current volume is unavailable. But it is in no way a
performance file system like the ones I mentioned.
Also, AFS in non-trivial and generally requires an almost-dedicated if
not completely dedicated admin.
~F
|
Ok. Excellent. I appreciate the info very much.
What I need to accomplish is this:
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
We are much more interested in the data being safe than we are in the
raw speed of the devices. We can't have something _slow_ per se,
however, if I have to sacrifice some transfer speed in order to have
more safety for the files, that is acceptable.
I am still reading about the systems mentioned and trying to figure out
what would suit our needs.
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
That would not be possible for us as we really need to have as little
downtime as humanly possible. (Don't we all!)
The conversation about this is great and I really appreciate any input
that can be given.
Thanks much.
JB |
|
| Back to top |
|
 |
Faeandar
Guest
|
Posted:
Tue Nov 23, 2004 10:30 pm Post subject:
Re: Storage across multiple servers? |
|
|
On Tue, 23 Nov 2004 12:09:09 -0500, JB Orca <jborca@gmail.com> wrote:
| Quote: | On 2004-11-22 20:43:52 -0500, Faeandar <mr_castalot@yahoo.com> said:
.....snip....
No. AFS is a read-many replica file system but still only has a single
writeable volume, and that resides on a single host. It's a great
file system for traditional file IO and program reads in that it
allows client side caching, multi-location replicas (for network
performance), and transparent read failover to another replica in the
event the current volume is unavailable. But it is in no way a
performance file system like the ones I mentioned.
Also, AFS in non-trivial and generally requires an almost-dedicated if
not completely dedicated admin.
~F
Ok. Excellent. I appreciate the info very much.
What I need to accomplish is this:
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
We are much more interested in the data being safe than we are in the
raw speed of the devices. We can't have something _slow_ per se,
however, if I have to sacrifice some transfer speed in order to have
more safety for the files, that is acceptable.
I am still reading about the systems mentioned and trying to figure out
what would suit our needs.
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
That would not be possible for us as we really need to have as little
downtime as humanly possible. (Don't we all!)
The conversation about this is great and I really appreciate any input
that can be given.
Thanks much.
JB
|
First thing then forget AFS. There is a backup limitation of 8gb max
per volume, this is across all backup software platforms that I am
aware of. It's not an AFS limit, just fyi.
You never mention what you're going to be doing with this data. Is it
for a single server? multiple servers? Mult host access? Multi host
write access?
The technology you use really depends on the requirements of your data
and users. Post a little more info on what you are trying to
accomplish and we might be able to help. Storage is simply a means to
an end, not the end itself.
~F |
|
| Back to top |
|
 |
Arne Joris
Guest
|
Posted:
Tue Nov 23, 2004 10:34 pm Post subject:
Re: Storage across multiple servers? |
|
|
JB Orca <jborca@gmail.com> wrote:
....
| Quote: | I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
|
With these kinds of numbers, you'll have lot of drives and thus drive
failures will become quite common. Are you looking at using some RAID
configuration to overcome this ?
| Quote: | The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
|
This would be using your LAN to move data unless the server doing the
I/O happens to have the target disk locally available, right ? I guess
with gigabit ethernet this might not be such a problem anymore, except
for processor overhead.
A SAN will allow every server to use Fibre Channel to move the data,
your LAN and server cpus won't be loaded nearly as much. Depending on
your application load, you could save a lot on LAN switches and servers
by spending more on a SAN.
....
| Quote: | The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
|
The only reason to go this way instead of a regular SAN would be cost I
guess; by using plain old scsi drives you'll cut down the cost
significantly. But again my first question, do you plan on using some
form of RAID (software, raid controller, raid enclosure,...) ?
If you just plug in a bunch of scsi drives into a bunch of servers and
start storing data on then, at a hundred terabytes worth of disks, you'll
be running around shutting down hosts in order to swap out disks all
day in my opinion.
Arne Joris |
|
| Back to top |
|
 |
JB Orca
Guest
|
Posted:
Tue Nov 23, 2004 10:46 pm Post subject:
Re: Storage across multiple servers? |
|
|
On 2004-11-23 12:30:35 -0500, Faeandar <mr_castalot@yahoo.com> said:
| Quote: | On Tue, 23 Nov 2004 12:09:09 -0500, JB Orca <jborca@gmail.com> wrote:
On 2004-11-22 20:43:52 -0500, Faeandar <mr_castalot@yahoo.com> said:
.....snip....
No. AFS is a read-many replica file system but still only has a single
writeable volume, and that resides on a single host. It's a great
file system for traditional file IO and program reads in that it
allows client side caching, multi-location replicas (for network
performance), and transparent read failover to another replica in the
event the current volume is unavailable. But it is in no way a
performance file system like the ones I mentioned.
Also, AFS in non-trivial and generally requires an almost-dedicated if
not completely dedicated admin.
~F
Ok. Excellent. I appreciate the info very much.
What I need to accomplish is this:
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
We are much more interested in the data being safe than we are in the
raw speed of the devices. We can't have something _slow_ per se,
however, if I have to sacrifice some transfer speed in order to have
more safety for the files, that is acceptable.
I am still reading about the systems mentioned and trying to figure out
what would suit our needs.
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
That would not be possible for us as we really need to have as little
downtime as humanly possible. (Don't we all!)
The conversation about this is great and I really appreciate any input
that can be given.
Thanks much.
JB
First thing then forget AFS. There is a backup limitation of 8gb max
per volume, this is across all backup software platforms that I am
aware of. It's not an AFS limit, just fyi.
You never mention what you're going to be doing with this data. Is it
for a single server? multiple servers? Mult host access? Multi host
write access?
The technology you use really depends on the requirements of your data
and users. Post a little more info on what you are trying to
accomplish and we might be able to help. Storage is simply a means to
an end, not the end itself.
~F
|
Good point...here's some additional info:
The plan is to have multiple 'user-facing' servers that the users will
interact with. Placing files, pulling files, etc.
All of these front-end servers should use a shared storage system. The
idea being this: if we have n number of front-end servers we can
balance any load across them, as long as our shared storage system is
robust enough we should be in decent shape.
The majority of files being stored will be in the 5-40 meg range. We do
not expect to have many over 40 megs.
There will be a lot of files.
So the idea is multi-host access, both read and write.
"Storage is simply a means to an end, not the end itself."
That is really well said. Perhaps I'm thinking about this too much....
I'm basically trying to be as thorough as possible and make sure I try
to think of any possible solution before committing the time it will
take to learn some new stuff and get a dev system up and running.
The idea of the mutli-server file system seemed good to me for this one
reason: it seemed like creating a RAID but using cheaper hardware with
an easier path to adding storage.
The idea of needing this much storage is a bit new to me, so I'm trying
to learn as I go here.
If I did a direct attached raid device, say a 2 terabyte raid, that
would be great. But, when it comes time to expand the storage it seems
like it would be a mess to add additional storage to that type of
system, no?
As I mentioned about, I'm kind of learning some of this as I go, so I
appreciate that help and input.
Many thanks.
JB |
|
| Back to top |
|
 |
JB Orca
Guest
|
Posted:
Tue Nov 23, 2004 10:49 pm Post subject:
Re: Storage across multiple servers? |
|
|
On 2004-11-23 12:34:03 -0500, Arne Joris <nospam@org.org> said:
| Quote: | JB Orca <jborca@gmail.com> wrote:
...
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
With these kinds of numbers, you'll have lot of drives and thus drive
failures will become quite common. Are you looking at using some RAID
configuration to overcome this ?
|
Yes, I think that would be needed. The idea was perhaps to do 6 drive
boxes with RAID 5 for each server.
| Quote: |
The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
This would be using your LAN to move data unless the server doing the
I/O happens to have the target disk locally available, right ? I guess
with gigabit ethernet this might not be such a problem anymore, except
for processor overhead.
A SAN will allow every server to use Fibre Channel to move the data,
your LAN and server cpus won't be loaded nearly as much. Depending on
your application load, you could save a lot on LAN switches and servers
by spending more on a SAN.
|
It seems that SANS are more geared towards allowing multiple servers to
use a 'shared storage' of sorts, but that the 'shared storage' is
partitioned for each individual server accessing it. Is that the case?
I'm in need of more of a NAS type of solution where there is one large
pooled storage area that all servers can access. They will all need
access to the same files.
| Quote: | ...
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
The only reason to go this way instead of a regular SAN would be cost I
guess; by using plain old scsi drives you'll cut down the cost
significantly. But again my first question, do you plan on using some
form of RAID (software, raid controller, raid enclosure,...) ?
If you just plug in a bunch of scsi drives into a bunch of servers and
start storing data on then, at a hundred terabytes worth of disks, you'll
be running around shutting down hosts in order to swap out disks all
day in my opinion.
Arne Joris
|
Ah. Good point. But, no matter which route I go I'm going to end up
with numerous drives, so I think this problem will exist no matter
what, right?
Thanks!
JB |
|
| Back to top |
|
 |
Faeandar
Guest
|
Posted:
Tue Nov 23, 2004 11:37 pm Post subject:
Re: Storage across multiple servers? |
|
|
On Tue, 23 Nov 2004 12:46:38 -0500, JB Orca <jborca@gmail.com> wrote:
| Quote: | On 2004-11-23 12:30:35 -0500, Faeandar <mr_castalot@yahoo.com> said:
On Tue, 23 Nov 2004 12:09:09 -0500, JB Orca <jborca@gmail.com> wrote:
On 2004-11-22 20:43:52 -0500, Faeandar <mr_castalot@yahoo.com> said:
.....snip....
No. AFS is a read-many replica file system but still only has a single
writeable volume, and that resides on a single host. It's a great
file system for traditional file IO and program reads in that it
allows client side caching, multi-location replicas (for network
performance), and transparent read failover to another replica in the
event the current volume is unavailable. But it is in no way a
performance file system like the ones I mentioned.
Also, AFS in non-trivial and generally requires an almost-dedicated if
not completely dedicated admin.
~F
Ok. Excellent. I appreciate the info very much.
What I need to accomplish is this:
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
We are much more interested in the data being safe than we are in the
raw speed of the devices. We can't have something _slow_ per se,
however, if I have to sacrifice some transfer speed in order to have
more safety for the files, that is acceptable.
I am still reading about the systems mentioned and trying to figure out
what would suit our needs.
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
That would not be possible for us as we really need to have as little
downtime as humanly possible. (Don't we all!)
The conversation about this is great and I really appreciate any input
that can be given.
Thanks much.
JB
First thing then forget AFS. There is a backup limitation of 8gb max
per volume, this is across all backup software platforms that I am
aware of. It's not an AFS limit, just fyi.
You never mention what you're going to be doing with this data. Is it
for a single server? multiple servers? Mult host access? Multi host
write access?
The technology you use really depends on the requirements of your data
and users. Post a little more info on what you are trying to
accomplish and we might be able to help. Storage is simply a means to
an end, not the end itself.
~F
Good point...here's some additional info:
The plan is to have multiple 'user-facing' servers that the users will
interact with. Placing files, pulling files, etc.
All of these front-end servers should use a shared storage system. The
idea being this: if we have n number of front-end servers we can
balance any load across them, as long as our shared storage system is
robust enough we should be in decent shape.
The majority of files being stored will be in the 5-40 meg range. We do
not expect to have many over 40 megs.
There will be a lot of files.
So the idea is multi-host access, both read and write.
"Storage is simply a means to an end, not the end itself."
That is really well said. Perhaps I'm thinking about this too much....
I'm basically trying to be as thorough as possible and make sure I try
to think of any possible solution before committing the time it will
take to learn some new stuff and get a dev system up and running.
The idea of the mutli-server file system seemed good to me for this one
reason: it seemed like creating a RAID but using cheaper hardware with
an easier path to adding storage.
The idea of needing this much storage is a bit new to me, so I'm trying
to learn as I go here.
If I did a direct attached raid device, say a 2 terabyte raid, that
would be great. But, when it comes time to expand the storage it seems
like it would be a mess to add additional storage to that type of
system, no?
As I mentioned about, I'm kind of learning some of this as I go, so I
appreciate that help and input.
Many thanks.
JB
|
I'm not sure what your budget is but my approach would be get the best
array I could first, then if any is left over look at the file system.
There are only a handful of array's that expand to the store you're
talking about, and none are cheap.
I would look at some mid-range arrays from HDS, LSI Logic, Zzyzxx (or
whatever the hell they are called), etc. Since you said data
integrity and safety is key you want good storage on the backend.
Having a single array is not bad since any enterprise array will have
no single point of failure. And honestly I've never lost data on any
storage array to date, at least no loss due to the hardware.
You probably want to go with some sort of clustering to allow one node
server to take for another in case of host-based failure. This is
where the options come in.
You can use a host-based cluster like Veritas or Legato, but
personally I like the file system cluster like PolyServe. Again,
budget may be an issue so you have to figure out what you need and
then determine the costs.
As far as I can tell from you're description my approach would be the
following:
mid-range array
fc switch
cluster file system with built-in failover
My personal choices for vendors would be:
HDS 9500
Brocade
Polyserve
But that's just me. I would expand the SAN (since that is what we're
talking about, probably don't want NAS for this) by adding switches
and array's as you need them. It's easy enough to expand a single LUN
across array's, though you didn't mention how large a single file
system might get.
~F |
|
| Back to top |
|
 |
Nik Simpson
Guest
|
Posted:
Tue Nov 23, 2004 11:41 pm Post subject:
Re: Storage across multiple servers? |
|
|
For what it's worth, my recommendation would be:
1. One or more storage arrays using large ATA or SATA drives, these are up
to 400GB/spindle now, so even 100TB doesn't require that many drives.
2. Use a controller than can do a RAID in hardware on the drives. A good
chunk of cache on the controller will probably help as well.
3. Attach the storage to a FC switch with enough free ports to handle some
expansion in storage as well as all the host connections.
There are numerous examples of large ATA arrays, companies like NexSAN
spring to mind.
Give each of your front-end hosts a single 2Gig FC link to the switch and
purchase a shared SAN-FS so that all the hosts can read/write to a single
data repository (the storage from (1).)
Variations on this include:
1. Dirt cheap, replace FC with Ethernet & iSCSI
2. High-performance, use FC drives and a more expensive controller
etc.
--
Nik Simpson |
|
| Back to top |
|
 |
JB Orca
Guest
|
Posted:
Wed Nov 24, 2004 2:19 am Post subject:
Re: Storage across multiple servers? |
|
|
On 2004-11-23 13:37:16 -0500, Faeandar <mr_castalot@yahoo.com> said:
| Quote: | On Tue, 23 Nov 2004 12:46:38 -0500, JB Orca <jborca@gmail.com> wrote:
On 2004-11-23 12:30:35 -0500, Faeandar <mr_castalot@yahoo.com> said:
On Tue, 23 Nov 2004 12:09:09 -0500, JB Orca <jborca@gmail.com> wrote:
On 2004-11-22 20:43:52 -0500, Faeandar <mr_castalot@yahoo.com> said:
.....snip....
No. AFS is a read-many replica file system but still only has a single
writeable volume, and that resides on a single host. It's a great
file system for traditional file IO and program reads in that it
allows client side caching, multi-location replicas (for network
performance), and transparent read failover to another replica in the
event the current volume is unavailable. But it is in no way a
performance file system like the ones I mentioned.
Also, AFS in non-trivial and generally requires an almost-dedicated if
not completely dedicated admin.
~F
Ok. Excellent. I appreciate the info very much.
What I need to accomplish is this:
I have a system that will need to start with roughly 5 terabytes of
storage space. It will very quickly grow to needing anywhere from
50-100 terabytes.
The problem we are attempting to solve is this: what is the best option
for the storage in this system? The original thought, before we
realized how big it was going to get, was just a large RAID direct
attach system. Then we thought about NAS or SAN, however, when I heard
the talk of spanning storage space across multiple servers this seemed
as though it might also be a good option.
We are much more interested in the data being safe than we are in the
raw speed of the devices. We can't have something _slow_ per se,
however, if I have to sacrifice some transfer speed in order to have
more safety for the files, that is acceptable.
I am still reading about the systems mentioned and trying to figure out
what would suit our needs.
The idea of a 'RAID' of servers seems fantastic. If I can use the
storage on 5 servers and stripe the data across them that would be
great, however, I have noticed with some of the options that in order
to add a new server the entire system needs to be taken down and
re-configured and brought back up.
That would not be possible for us as we really need to have as little
downtime as humanly possible. (Don't we all!)
The conversation about this is great and I really appreciate any input
that can be given.
Thanks much.
JB
First thing then forget AFS. There is a backup limitation of 8gb max
per volume, this is across all backup software platforms that I am
aware of. It's not an AFS limit, just fyi.
You never mention what you're going to be doing with this data. Is it
for a single server? multiple servers? Mult host access? Multi host
write access?
The technology you use really depends on the requirements of your data
and users. Post a little more info on what you are trying to
accomplish and we might be able to help. Storage is simply a means to
an end, not the end itself.
~F
Good point...here's some additional info:
The plan is to have multiple 'user-facing' servers that the users will
interact with. Placing files, pulling files, etc.
All of these front-end servers should use a shared storage system. The
idea being this: if we have n number of front-end servers we can
balance any load across them, as long as our shared storage system is
robust enough we should be in decent shape.
The majority of files being stored will be in the 5-40 meg range. We do
not expect to have many over 40 megs.
There will be a lot of files.
So the idea is multi-host access, both read and write.
"Storage is simply a means to an end, not the end itself."
That is really well said. Perhaps I'm thinking about this too much....
I'm basically trying to be as thorough as possible and make sure I try
to think of any possible solution before committing the time it will
take to learn some new stuff and get a dev system up and running.
The idea of the mutli-server file system seemed good to me for this one
reason: it seemed like creating a RAID but using cheaper hardware with
an easier path to adding storage.
The idea of needing this much storage is a bit new to me, so I'm trying
to learn as I go here.
If I did a direct attached raid device, say a 2 terabyte raid, that
would be great. But, when it comes time to expand the storage it seems
like it would be a mess to add additional storage to that type of
system, no?
As I mentioned about, I'm kind of learning some of this as I go, so I
appreciate that help and input.
Many thanks.
JB
I'm not sure what your budget is but my approach would be get the best
array I could first, then if any is left over look at the file system.
There are only a handful of array's that expand to the store you're
talking about, and none are cheap.
I would look at some mid-range arrays from HDS, LSI Logic, Zzyzxx (or
whatever the hell they are called), etc. Since you said data
integrity and safety is key you want good storage on the backend.
Having a single array is not bad since any enterprise array will have
no single point of failure. And honestly I've never lost data on any
storage array to date, at least no loss due to the hardware.
You probably want to go with some sort of clustering to allow one node
server to take for another in case of host-based failure. This is
where the options come in.
You can use a host-based cluster like Veritas or Legato, but
personally I like the file system cluster like PolyServe. Again,
budget may be an issue so you have to figure out what you need and
then determine the costs.
As far as I can tell from you're description my approach would be the
following:
mid-range array
fc switch
cluster file system with built-in failover
My personal choices for vendors would be:
HDS 9500
Brocade
Polyserve
But that's just me. I would expand the SAN (since that is what we're
talking about, probably don't want NAS for this) by adding switches
and array's as you need them. It's easy enough to expand a single LUN
across array's, though you didn't mention how large a single file
system might get.
~F
|
Ok...you make a lot of sense here. I guess part of my problem could be
lack of 100% knowledge of SAN. It seems like one of those things you
never _really_ understand until you have worked with it. Unfortunately,
in my case, I have no way to work on one until I make a decision. Not
the best scenario, I know.
One of the things I am unclear on is the expansion of the array.
For example:
Let's assume I have the following:
One array at 10 TB.
One FB Switch
2 front end servers.
The 2 front end servers can not access the same data on the SAN without
the clustered file system, is that correct?
Without Polyserver (or something similar) the 2 machines can both use
the SAN, however, they can not use the same data on the SAN, they will
be able to use 2 seperate LUN's. Is that correct?
And...assuming I need to expand my 10 TB up to 50 TB, how does that
happen? I would need to purchase an additional array, that I
understand, but, how does that get tied in to the first array and
become included in the LUN? That's sounds like a simple question as it
is stated, so I'm not sure I phrased that correctly.
But maybe you see where I'm going with this?
OR...does anyone have any suggestions for sites and/or books that could
be helpful for this type of thing? In my searching so far I have not
found a wealth of books with the information.
Again...thanks for all the help.
JB |
|
| Back to top |
|
 |
Nik Simpson
Guest
|
Posted:
Wed Nov 24, 2004 2:37 am Post subject:
Re: Storage across multiple servers? |
|
|
JB Orca wrote:
| Quote: |
One of the things I am unclear on is the expansion of the array.
For example:
Let's assume I have the following:
One array at 10 TB.
One FB Switch
2 front end servers.
The 2 front end servers can not access the same data on the SAN
without the clustered file system, is that correct?
|
Correct, if you want two servers to have concurrent read/write access, then
some sort of filesystem that understands multi-node access is required,
otherwise they'll trash the filesystem pretty quickly.
| Quote: |
Without Polyserver (or something similar) the 2 machines can both use
the SAN, however, they can not use the same data on the SAN, they will
be able to use 2 seperate LUN's. Is that correct?
|
Correct. Serve up two LUNs one to each server, it's no different from local
storage inside the server in that case.
| Quote: |
And...assuming I need to expand my 10 TB up to 50 TB, how does that
happen? I would need to purchase an additional array, that I
understand, but, how does that get tied in to the first array and
become included in the LUN? That's sounds like a simple question as it
is stated, so I'm not sure I phrased that correctly.
|
Typically, it doesn't unless you have a logical volume manager on the server
that can stripe across multiple LUNs (i.e. Veritas LVM) In that case, you
would serve the LUN up to the host and then use the LVM to incorporate it
into the existing filesystem.
| Quote: |
But maybe you see where I'm going with this?
|
Not really, it would be a very helpful if you started with a high-level
description of the application that you are "designing" this storage for.
There maybe other ways to approach the problem, but since we don't know the
problem, just your idea of the solutions, it's hard to tell :-)
--
Nik Simpson |
|
| Back to top |
|
 |
|
|
|
|