[illumos-Developer] MUCH better --> zil for zvol TRUNCATE (and iSCSI UNMAP) working...
Dan McDonald
danmcd at nexenta.com
Tue Mar 1 12:19:30 PST 2011
On Tue, Mar 01, 2011 at 02:55:13PM -0500, Eric Schrock wrote:
> On Tue, Mar 1, 2011 at 2:38 PM, Dan McDonald <danmcd at nexenta.com> wrote:
>
> The txg_wait_synced() call doesn't accomplish anything. The truncate
> operation is already committed to disk via zil_commit(), and the blocks are
> freed across multiple txgs, so it doesn't even wait for space to become
> "free" (itself a tricky concept in ZFS due to snapshots, etc). And of
> course, adding txg_wait_synced() in your data path will absolutely destroy
> performance - hence the creation of the ZIL in the first place :-)
> Preserving this distinction in the API seems appropriate, but there's no
> reason to call txg_wait_synced() in the zvol implementation.
You mean preserving DF_WAIT_SYNC? Or SL_WRITEBACK_CACHE_DISABLED?
I'll take out the txg_wait_synced() call.
> I agree with your interpretation of SL_WRITEBACK_CACHE_DISABLED, though it's
> inherently odd when it comes to frees. It's not clear to me how a consumer
> could recover from "lost" frees. For the zvol implementation this is all
> moot, but we can start with your interpretation and adapt it as necessary if
> we ever have a backing store that honors async free semantics. Or do the
> same but make the default always synchronous.
We'll save that debate for UNMAP on other backing stores.
BTW, I just took out the set-zle-if-not-compressing code, and found that:
- My particular client (ext4 over iSCSI) sets up a lot (> 20MB) of
zeroed blocks. Not sure why. Probably superblocks and what-not.
Using zle compression may be useful in practice, and it might be
why the original author set it during zvol setup.
- UNMAP does do the right thing. Consider my 4MB test:
everywhere(~)[0]% zfs list rpool/iscsi
NAME USED AVAIL REFER MOUNTPOINT
rpool/iscsi 528M 236G 24.7M -
everywhere(~)[0]% echo "Now we write 4MB worth of file..."
Now we write 4MB worth of file...
everywhere(~)[0]% zfs list rpool/iscsi
NAME USED AVAIL REFER MOUNTPOINT
rpool/iscsi 528M 236G 28.7M -
everywhere(~)[0]% echo "Let's 'rm' the file and sync..."
Let's 'rm' the file and sync...
everywhere(~)[0]% zfs list rpool/iscsi
NAME USED AVAIL REFER MOUNTPOINT
rpool/iscsi 528M 236G 24.7M -
everywhere(~)[0]%
BTW, if I turn on zle on the dataset, the "newly-formatted disk"
reference amount is measurable in 100s of kilobytes.
everywhere(~)[0]% zfs list rpool/iscsi ; zfs get compression rpool/iscsi
NAME USED AVAIL REFER MOUNTPOINT
rpool/iscsi 528M 236G 114K -
NAME PROPERTY VALUE SOURCE
rpool/iscsi compression zle local
everywhere(~)[0]% echo "now with zle turned on..."
now with zle turned on...
everywhere(~)[0]% echo "how 'bout those 4mb?"
how 'bout those 4mb?
everywhere(~)[0]% zfs list rpool/iscsi
NAME USED AVAIL REFER MOUNTPOINT
rpool/iscsi 528M 236G 4.12M -
everywhere(~)[0]% echo "now let's take it out again..."
now let's take it out again...
everywhere(~)[0]% zfs list rpool/iscsi
NAME USED AVAIL REFER MOUNTPOINT
rpool/iscsi 528M 236G 120K -
everywhere(~)[0]%
It's probably not a case for force-zle-at-a-minimum, but it may be something,
oh, say a storage vendor, might want to turn on at its administrative
interface level. :-)
Updated the webrev:
http://www.kebe.com/~danmcd/webrevs/unmap/
I'm going to keep the SL_WRITEBACK_CACHE_DISABLED check, and always use the
data vnode pointer, eliminating the last KEBE comments.
Thanks,
Dan
More information about the Developer
mailing list