[illumos-Developer] MUCH better --> zil for zvol TRUNCATE (and iSCSI UNMAP) working...

Tue Mar 1 12:19:30 PST 2011

On Tue, Mar 01, 2011 at 02:55:13PM -0500, Eric Schrock wrote:
> On Tue, Mar 1, 2011 at 2:38 PM, Dan McDonald <danmcd at nexenta.com> wrote:
> 
> The txg_wait_synced() call doesn't accomplish anything.  The truncate
> operation is already committed to disk via zil_commit(), and the blocks are
> freed across multiple txgs, so it doesn't even wait for space to become
> "free" (itself a tricky concept in ZFS due to snapshots, etc).  And of
> course, adding txg_wait_synced() in your data path will absolutely destroy
> performance - hence the creation of the ZIL in the first place :-)
>  Preserving this distinction in the API seems appropriate, but there's no
> reason to call txg_wait_synced() in the zvol implementation.

You mean preserving DF_WAIT_SYNC?  Or SL_WRITEBACK_CACHE_DISABLED?

I'll take out the txg_wait_synced() call.

> I agree with your interpretation of SL_WRITEBACK_CACHE_DISABLED, though it's
> inherently odd when it comes to frees.  It's not clear to me how a consumer
> could recover from "lost" frees.  For the zvol implementation this is all
> moot, but we can start with your interpretation and adapt it as necessary if
> we ever have a backing store that honors async free semantics.  Or do the
> same but make the default always synchronous.

We'll save that debate for UNMAP on other backing stores.

BTW, I just took out the set-zle-if-not-compressing code, and found that:

	- My particular client (ext4 over iSCSI) sets up a lot (> 20MB) of
          zeroed blocks.  Not sure why.  Probably superblocks and what-not.
          Using zle compression may be useful in practice, and it might be
          why the original author set it during zvol setup.

	- UNMAP does do the right thing.  Consider my 4MB test:

everywhere(~)[0]% zfs list rpool/iscsi
NAME          USED  AVAIL  REFER  MOUNTPOINT
rpool/iscsi   528M   236G  24.7M  -
everywhere(~)[0]% echo "Now we write 4MB worth of file..."
Now we write 4MB worth of file...
everywhere(~)[0]% zfs list rpool/iscsi
NAME          USED  AVAIL  REFER  MOUNTPOINT
rpool/iscsi   528M   236G  28.7M  -
everywhere(~)[0]% echo "Let's 'rm' the file and sync..."
Let's 'rm' the file and sync...
everywhere(~)[0]% zfs list rpool/iscsi
NAME          USED  AVAIL  REFER  MOUNTPOINT
rpool/iscsi   528M   236G  24.7M  -
everywhere(~)[0]% 

	   BTW, if I turn on zle on the dataset, the "newly-formatted disk"
	   reference amount is measurable in 100s of kilobytes.

everywhere(~)[0]% zfs list rpool/iscsi ; zfs get compression rpool/iscsi
NAME          USED  AVAIL  REFER  MOUNTPOINT
rpool/iscsi   528M   236G   114K  -
NAME         PROPERTY     VALUE     SOURCE
rpool/iscsi  compression  zle       local
everywhere(~)[0]% echo "now with zle turned on..."
now with zle turned on...
everywhere(~)[0]% echo "how 'bout those 4mb?"
how 'bout those 4mb?
everywhere(~)[0]% zfs list rpool/iscsi
NAME          USED  AVAIL  REFER  MOUNTPOINT
rpool/iscsi   528M   236G  4.12M  -
everywhere(~)[0]% echo "now let's take it out again..."
now let's take it out again...
everywhere(~)[0]% zfs list rpool/iscsi
NAME          USED  AVAIL  REFER  MOUNTPOINT
rpool/iscsi   528M   236G   120K  -
everywhere(~)[0]% 

It's probably not a case for force-zle-at-a-minimum, but it may be something,
oh, say a storage vendor, might want to turn on at its administrative
interface level.  :-)

Updated the webrev:

	http://www.kebe.com/~danmcd/webrevs/unmap/

I'm going to keep the SL_WRITEBACK_CACHE_DISABLED check, and always use the
data vnode pointer, eliminating the last KEBE comments.

Thanks,
Dan