[illumos-Developer] MUCH better --> zil for zvol TRUNCATE (and iSCSI UNMAP) working...

George Wilson gwilson at zfsmail.com
Wed Mar 2 09:18:02 PST 2011


On 3/1/11 12:19 PM, Dan McDonald wrote:
> On Tue, Mar 01, 2011 at 02:55:13PM -0500, Eric Schrock wrote:
>> On Tue, Mar 1, 2011 at 2:38 PM, Dan McDonald<danmcd at nexenta.com>  wrote:
>>
>> The txg_wait_synced() call doesn't accomplish anything.  The truncate
>> operation is already committed to disk via zil_commit(), and the blocks are
>> freed across multiple txgs, so it doesn't even wait for space to become
>> "free" (itself a tricky concept in ZFS due to snapshots, etc).  And of
>> course, adding txg_wait_synced() in your data path will absolutely destroy
>> performance - hence the creation of the ZIL in the first place :-)
>>   Preserving this distinction in the API seems appropriate, but there's no
>> reason to call txg_wait_synced() in the zvol implementation.
> You mean preserving DF_WAIT_SYNC?  Or SL_WRITEBACK_CACHE_DISABLED?
>
> I'll take out the txg_wait_synced() call.

Sorry guys, I feel behind on this thread.

The one thing that the txg_wait_synced() provides is for callers to wait 
for all the blocks to be freed before continuing. I don't know if this 
is a requirement or not.
>> I agree with your interpretation of SL_WRITEBACK_CACHE_DISABLED, though it's
>> inherently odd when it comes to frees.  It's not clear to me how a consumer
>> could recover from "lost" frees.  For the zvol implementation this is all
>> moot, but we can start with your interpretation and adapt it as necessary if
>> we ever have a backing store that honors async free semantics.  Or do the
>> same but make the default always synchronous.
> We'll save that debate for UNMAP on other backing stores.
>
> BTW, I just took out the set-zle-if-not-compressing code, and found that:
>
> 	- My particular client (ext4 over iSCSI) sets up a lot (>  20MB) of
>            zeroed blocks.  Not sure why.  Probably superblocks and what-not.
>            Using zle compression may be useful in practice, and it might be
>            why the original author set it during zvol setup.
>
> 	- UNMAP does do the right thing.  Consider my 4MB test:
>
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME          USED  AVAIL  REFER  MOUNTPOINT
> rpool/iscsi   528M   236G  24.7M  -
> everywhere(~)[0]% echo "Now we write 4MB worth of file..."
> Now we write 4MB worth of file...
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME          USED  AVAIL  REFER  MOUNTPOINT
> rpool/iscsi   528M   236G  28.7M  -
> everywhere(~)[0]% echo "Let's 'rm' the file and sync..."
> Let's 'rm' the file and sync...
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME          USED  AVAIL  REFER  MOUNTPOINT
> rpool/iscsi   528M   236G  24.7M  -
> everywhere(~)[0]%
>
> 	   BTW, if I turn on zle on the dataset, the "newly-formatted disk"
> 	   reference amount is measurable in 100s of kilobytes.
>
> everywhere(~)[0]% zfs list rpool/iscsi ; zfs get compression rpool/iscsi
> NAME          USED  AVAIL  REFER  MOUNTPOINT
> rpool/iscsi   528M   236G   114K  -
> NAME         PROPERTY     VALUE     SOURCE
> rpool/iscsi  compression  zle       local
> everywhere(~)[0]% echo "now with zle turned on..."
> now with zle turned on...
> everywhere(~)[0]% echo "how 'bout those 4mb?"
> how 'bout those 4mb?
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME          USED  AVAIL  REFER  MOUNTPOINT
> rpool/iscsi   528M   236G  4.12M  -
> everywhere(~)[0]% echo "now let's take it out again..."
> now let's take it out again...
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME          USED  AVAIL  REFER  MOUNTPOINT
> rpool/iscsi   528M   236G   120K  -
> everywhere(~)[0]%
>
>
> It's probably not a case for force-zle-at-a-minimum, but it may be something,
> oh, say a storage vendor, might want to turn on at its administrative
> interface level.  :-)

Totally agree.

- George

> Updated the webrev:
>
> 	http://www.kebe.com/~danmcd/webrevs/unmap/
>
> I'm going to keep the SL_WRITEBACK_CACHE_DISABLED check, and always use the
> data vnode pointer, eliminating the last KEBE comments.
>
> Thanks,
> Dan
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer




More information about the Developer mailing list