Is the RAID-5 write penalty really necessary?
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Is the RAID-5 write penalty really necessary?

 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Storage System
Author Message
Adam Megacz
Guest





Posted: Sat Nov 13, 2004 9:41 am    Post subject: Is the RAID-5 write penalty really necessary? Reply with quote

No really, I checked the FAQ on this.

I understand the reason for the RAID-5 write penalty. What I don't
understand is this: why not just set the block size to (N-1)*blocksize
(where N is the number of drives in the array and blocksize is the
hardware-level block size).

Since you can only write full blocks, there's never any need to read
back the underlying parity -- because you know that if you're writing
to a given slice on any particular drive, you're also certain that
you'll be overwriting the corresponding slice on all the other drives.

So, I guess my quesiton is: why not just make the block size bigger
and never read before writing?

I think I saw a paper on something similar dubbed "RAID 3.5", but
haven't seen an implementation of it yet.

- a

--
I wrote my own mail server and it still has a few bugs.
If you send me a message and it bounces, please forward the
bounce message to megacz@gmail.com. Thanks!
Back to top
Malcolm Weir
Guest





Posted: Sun Nov 14, 2004 10:41 am    Post subject: Re: Is the RAID-5 write penalty really necessary? Reply with quote

On Fri, 12 Nov 2004 20:41:05 -0800, Adam Megacz <adam@megacz.com>
wrote:

Quote:
No really, I checked the FAQ on this.

I understand the reason for the RAID-5 write penalty. What I don't
understand is this: why not just set the block size to (N-1)*blocksize
(where N is the number of drives in the array and blocksize is the
hardware-level block size).

Since you can only write full blocks, there's never any need to read
back the underlying parity -- because you know that if you're writing
to a given slice on any particular drive, you're also certain that
you'll be overwriting the corresponding slice on all the other drives.

So, I guess my quesiton is: why not just make the block size bigger
and never read before writing?

Why mess with the block size? You can simply hold off updating a
block to see if the adjacent block is coming along from the host, and
if it his, repeat until the entire stripe needs to be written. Then
generate parity and splat away. It's usually referred to as "full
stripe optimization" or something similar.

One marginally awkward consequence is that it tends to work best if
you have 5, 9 or 17 disks.

Quote:
I think I saw a paper on something similar dubbed "RAID 3.5", but
haven't seen an implementation of it yet.

That would be a silly name. RAID 3 stripes below the block level,
RAID 4 & 5 above it.

Quote:
- a

Malc.
Back to top
Adam Megacz
Guest





Posted: Mon Nov 15, 2004 2:55 pm    Post subject: Re: Is the RAID-5 write penalty really necessary? Reply with quote

Malcolm Weir <malc@gelt.org> writes:
Quote:
Why mess with the block size? You can simply hold off updating a
block to see if the adjacent block is coming along from the host, and
if it his, repeat until the entire stripe needs to be written. Then
generate parity and splat away. It's usually referred to as "full
stripe optimization" or something similar.

Ah, good point!

Hrm, in light of this, it seems that the "raid 5 write penalty" is
just an artifact of poor implementations!

Do you happen to know offhand if the Linux md driver does this (ie
maintain a kernel-space buffer and treat userspace as the "host")?

- a

--
I wrote my own mail server and it still has a few bugs.
If you send me a message and it bounces, please forward the
bounce message to megacz@gmail.com. Thanks!
Back to top
Scott Howard
Guest





Posted: Mon Nov 15, 2004 5:15 pm    Post subject: Re: Is the RAID-5 write penalty really necessary? Reply with quote

Adam Megacz <adam@megacz.com> wrote:
Quote:
Hrm, in light of this, it seems that the "raid 5 write penalty" is
just an artifact of poor implementations!

So every vendor out there has a poor implemention? Wow, you're going
to make yourself rich with this discovery - time to go talk to EMC, HDS,
etc, etc :)

Hint: what happens when someone writes 512 bytes to a disk.
And then doesn't write anything else around it?
How do you de-stage it?

Hardware RAID-5 arrays get around this by doing exactly what you're
suggesting - holding things in cache in an attempt to turn them into larger
writes. Software RAID-5 (or non battery-backed hardware RAID-5) can't do
this without risking data loss in the event of an outage.

Scott.
Back to top
Thor Lancelot Simon
Guest





Posted: Mon Nov 15, 2004 6:08 pm    Post subject: Re: Is the RAID-5 write penalty really necessary? Reply with quote

In article <x1mzxj1loh.fsf@nowhere.com>, Adam Megacz <adam@megacz.com> wrote:
Quote:

Malcolm Weir <malc@gelt.org> writes:
Why mess with the block size? You can simply hold off updating a
block to see if the adjacent block is coming along from the host, and
if it his, repeat until the entire stripe needs to be written. Then
generate parity and splat away. It's usually referred to as "full
stripe optimization" or something similar.

Ah, good point!

Hrm, in light of this, it seems that the "raid 5 write penalty" is
just an artifact of poor implementations!

No. If you have more than one write stream at a time, consecutive writes
may not be into the same stripe. To some extent, this problem too can be
defeated by caching; but, obviously, such caching needs to take place in
nonvolatile RAM or you have to leave _all_ the writes pending on the bus
until you commit whichever full stripes you're going to commit.

It's easy to "solve" this problem for a single write stream. Unfortunately,
single-stream performance is not actually indicative of performance for
most real applications in the real world.

--
Thor Lancelot Simon tls@rek.tjls.com
But as he knew no bad language, he had called him all the names of common
objects that he could think of, and had screamed: "You lamp! You towel! You
plate!" and so on. --Sigmund Freud
Back to top
Robert Wessel
Guest





Posted: Tue Nov 16, 2004 10:31 am    Post subject: Re: Is the RAID-5 write penalty really necessary? Reply with quote

Adam Megacz <adam@megacz.com> wrote in message news:<x1k6sqgy3i.fsf@nowhere.com>...
Quote:
No really, I checked the FAQ on this.

I understand the reason for the RAID-5 write penalty. What I don't
understand is this: why not just set the block size to (N-1)*blocksize
(where N is the number of drives in the array and blocksize is the
hardware-level block size).

Since you can only write full blocks, there's never any need to read
back the underlying parity -- because you know that if you're writing
to a given slice on any particular drive, you're also certain that
you'll be overwriting the corresponding slice on all the other drives.

So, I guess my quesiton is: why not just make the block size bigger
and never read before writing?

I think I saw a paper on something similar dubbed "RAID 3.5", but
haven't seen an implementation of it yet.


There's the minor problem that for all but the smallest array your
solution is worse than the normal RAID-5 write penalty, even
considering only writes.

Consider a simple six disk array. Your solution requires that I issue
six write operations, rather than the two reads and two writes. While
this might be a small win for latency, it certainly is much worse for
throughput. Remember that with more drives to involve the average
time to complete the operations will increase just because your likely
to have a larger "longest seek" if you have to seek on six rather than
two drives, and in the RRWW/RAID-5 case, the writes don't need seeks,
and tend to be reasonably quick. You also loose the ability to run
multiple writes in parallel.

Of course real workloads are not just write only, and if you consider
reads, your solution is massively worse. With your one block per
stripe configuration, you've managed to involve every (OK, all but
one) drive in every read operation. Thus reducing the random I/O
throughput to something similar to that of a single drive. With the
above mentioned six drive array, I can be doing six separate random
reads at once, so long as they hit different drives.

Now what you've described is quite similar to RAID-2 or RAID-3, and
RAID-3 is quite commonly supported. RAID-2/3 usually has a vastly
larger "block size" than you are proposing, however. But while RAID-3
(or the practically nonexistent RAID-2) provides excellent data rates
for largely sequential I/Os, random I/O performance is at basically
single-disk levels. RAID-3 is commonly used in HPC systems that have
to stream very large datasets at high rates, it doesn't have much
appeal for most database/commercial server systems or desktops.

If the RAID-5 write penalty is an issue for you, the usual solution is
to go to a RAID-0+1 approach.
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Storage System All times are GMT
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB