[illumos-Discuss] md aka svm aka lvm

Haudy Kazemi kaze0010 at umn.edu
Sun Aug 29 14:26:57 PDT 2010


Kyle and Garrett: we are not disagreeing.  Your responses agree with my 
own response.

Garrett D'Amore wrote:
> I tend to agree... anyone believing that JBOD concatenation gives a
> better sense of reliability probably misunderstands how filesystem
> metadata (and potentially even block data for the files themselves) is
> scattered around the filesystem, and setting themselves up for failure.
>   
That's why I raised the caveats about file fragmentation and filesystem 
tables/metadata.  These caveats effectively make the risk of using 
concatenation similar to the risk of RAID0.  Theoretically, there is 
still a slightly lower risk to using concatenation because there is a 
greater chance that file carving techniques will succeed.  Practically, 
the caveats will have a noticeable effect, as anyone who has attempted 
file carving (even on a single drive) will know well.

> Concatenation as a way to "reduce" points of failure is a mistake.   If
> you want reliability, then don't use RAID0 or concatenation, unless
> using mirrors underneath or somesuch.
>   
I would not characterize concatenation as being intended for reducing 
failure points, rather it is more a means to easily grow an array, and 
to theoretically make salvaging data easier.  When speaking of 
salvaging, we are talking about minimizing damage that has occurred, 
rather than reliability intended to prevent nonrecoverable damage from 
occurring to begin with.

The marginal difference in reliability between concat and RAID0 is 
small.  It should not be considered as having much if any value when the 
stored data is otherwise valuable or irreplaceable.  In my opinion, 
concatenation/JBOD's 'safety' factor over RAID0 is overvalued because of 
the caveats pointed out before.  The effects of the listed caveats on 
JBOD recovery are under recognized and under appreciated.  
JBOD/concatenation is likely a result of ease implementation and array 
expansion issues vs RAID0 more so than anything else.

A better intermediary option is as Karl describes: one filesystem per 
non-redundant disk, which at least guarantees a compartmentalization of 
damage.

More below.


> I'll allow that there may be other reasons that concatenation is
> preferable to RAID0, but I *suspect* that most people who are doing so
> are often mistaken about filesystem optimization.  I suspect that in the
> vast majority of cases it is better to let the filesystem lay things out
> for you.  (In an ideal world the filesystem would be able to monitor
> disk activity and move things around when it finds one spindle more
> heavily used than another.)
>
> 	- Garrett
>
>
> On Sun, 2010-08-29 at 15:24 -0400, Kyle McDonald wrote:
>   
>> On 8/29/2010 2:53 PM, Haudy Kazemi wrote:
>>     
>>> RAID0 = striping
>>> JBOD = straight concatenation
>>>
>>> Neither has any redundancy, however the potential impact of a failure 
>>> is different.    JBOD failure has the potential of being less severe 
>>> than RAID0 failure.  With JBOD, most likely you will only lose the 
>>> content of single drive that failed (the remaining content has some 
>>> chance of being recoverable).  With RAID0, you lose everything larger 
>>> than the stripe width, which means any medium or large files, because 
>>> they have been striped across multiple drives.  The smaller files fit 
>>> within a stripe, so they should still be recoverable assuming the 
>>> drive they ended up on is still working.  (Actually, with RAID0, a 
>>> failed drive just about guarantees your medium and large files have 
>>> holes in them, while with JBOD those files might have holes in them 
>>> because of fragmentation.)
>>> Some caveats that apply are the effects of file fragmentation and the 
>>> potential loss of filesystem tables/metadata.  In either case, if you 
>>> lose the filesystem tables/metadata, you will need to file carve out 
>>> anything that remains, and file carving doesn't work very well on 
>>> fragmented files.
>>>
>>>       
>> The idea that the data on one disk would still be recoverable seems a 
>> stretch to me. While it may be readable, in my expeirience with SVM 
>> accessing the data is not going to be simple - SVM isn't going to  help 
>> you out though dd might. On top of that, while it's not striped in the 
>> regular way, there is still no guarantee that all the blocks of the file 
>> you're interested in will be on the surviving disk. UFS tries to do that 
>> somewhat, but on a long lived FS it's ability to do that will be 
>> limited. Even if a file is all on one disk, you have no easy way of know 
>> which it's on.
>>     
With a concatenated array of disks (assuming zero fragmentation and no 
loss of important metadata), you will lose what ever files were on the 
failed disk.  You don't get to chose the files that survive...all have 
an approximately equal chance of being lost regardless of which ones you 
are more interested in.  If the filesystem tables/metadata are intact, 
you will know which files are affected by looking up the block addresses 
associated with the file and then seeing which disk those translate to.  
If the filesystem tables/metadata is lost, you'll get back whatever the 
file carving software can find using file type signatures and heuristics 
of where the file ends.

>> So the net effect to me isn't that great. I've always stayed away from 
>> both RAID0 and Concatenation. While it does decrease the flexibility of 
>> space usage, If I've had multiple disks and didn't want to have 
>> redundancy and didn't need the the performance boost, I've always just 
>> partitioned and made a FS on each disk and mounted them on the system. 
>> That's really the only way to salvage one disks worth of data when the 
>> other one fails. It's the only way to know what files are on which disk, 
>> and ensure that each file is completely on one disk.
>>     
I agree.  That is a strategy I myself have used for storing replaceable 
or low value data where losing one disk's worth of data has a tolerable 
time/hassle/annoyance factor, but replacing many disk's worth would have 
an unacceptable tolerable time/hassle/annoyance factor. 






More information about the Discuss mailing list