[illumos-Discuss] Block pointer rewrite?

Haudy Kazemi kaze0010 at umn.edu
Mon Feb 7 14:37:15 PST 2011


On 2/6/2011 8:16 PM, Garrett D'Amore wrote:
> On Sun, 2011-02-06 at 15:09 +0100, Roy Sigurd Karlsbakk wrote:
>> Hi all
>>
>> Is it possibly, or likely, that OI/Illumos will ever get block pointer rewrite? I have a 50TB system that was almost filled up with its initial 30TB of storage before we added more drives. The problem is, those VDEVs are still full, and the system is _slow_, and I don't really want to make a backup/restore of those 35TB or so on it. What will it take to make VDEV balancing work on OI/Illumos?
> Its possible, and possibly even likely, that we will get this at some
> point.  Its mostly a matter of time/investment, and priorities.
>
> 	- Garrett

I have a scriptable idea that offers a way to re-balance data on VDEVs 
without using block pointer rewrite and without doing a full 
backup/restore.  I haven't tested this yet.  Comments welcome.

Conditions/Prerequisites/Caveats:
1.) a second (temporary) storage pool that is at least large enough to 
hold the single largest file in your collection, and preferably larger 
for group batches.
2.) no requirement to keep old filesystem snapshots (i.e. can delete all 
old snapshots)
3.) a period of time where the filesystem can effectively be unavailable 
to applications and users (because a file they need might be temporarily 
moved off)
4.) some space needs to be open on each VDEV that is part of the pool.  
This is easiest if the original VDEVs were never completely full, and 
any additional VDEVs have not been filled either.  If the original VDEVs 
were completely filled, then it is necessary to first apply this 
procedure to any files that were on the original VDEVs, and then apply 
it to any files written to the added VDEVs.
5.) VDEVs consisting of multiple device sizes cannot be fully balanced.  
E.g. a pool consisting of mixed set of 500gb and 1TB drives.  The 
smaller drives will fill first, and then performance will decrease.


Steps:
1.) move file or group of files to temporary storage pool.  If you know 
which files were only on some of the VDEVs (i.e. the disks that got full 
before additional capacity was added to the pool), move those files 
first.  Hopefully the newer vdevs have some free space on them.  If 
multiple sets of VDEVs have been added (and filled) before starting this 
procedure, try to make sure some space has been freed up on all by 
moving files away from each VDEV.  File timestamps combined with the 
dates upon which the VDEVs were added should help narrow down which 
files are most likely on which VDEV.

2.) clear all old snapshots (to eliminate old pointers to the data 
blocks for the files that were just moved).  (Maybe also scrub?)

3.) move file or group of files back to the main pool.  ZFS should 
stripe these rewritten files across the VDEVs that are not completely full.

4.) repeat from beginning if there were multiple sets of VDEVs that were 
completely filled up after they were added to the pool.  (repeat for 
each set of files that were written to each set of now full VDEVs).  Or 
find another way to ensure there is some space freed up on each VDEV.





More information about the Discuss mailing list