[illumos-Developer] MUCH better --> zil for zvol TRUNCATE (and iSCSI UNMAP) working...
George Wilson
gwilson at zfsmail.com
Wed Mar 2 09:18:02 PST 2011
On 3/1/11 12:19 PM, Dan McDonald wrote:
> On Tue, Mar 01, 2011 at 02:55:13PM -0500, Eric Schrock wrote:
>> On Tue, Mar 1, 2011 at 2:38 PM, Dan McDonald<danmcd at nexenta.com> wrote:
>>
>> The txg_wait_synced() call doesn't accomplish anything. The truncate
>> operation is already committed to disk via zil_commit(), and the blocks are
>> freed across multiple txgs, so it doesn't even wait for space to become
>> "free" (itself a tricky concept in ZFS due to snapshots, etc). And of
>> course, adding txg_wait_synced() in your data path will absolutely destroy
>> performance - hence the creation of the ZIL in the first place :-)
>> Preserving this distinction in the API seems appropriate, but there's no
>> reason to call txg_wait_synced() in the zvol implementation.
> You mean preserving DF_WAIT_SYNC? Or SL_WRITEBACK_CACHE_DISABLED?
>
> I'll take out the txg_wait_synced() call.
Sorry guys, I feel behind on this thread.
The one thing that the txg_wait_synced() provides is for callers to wait
for all the blocks to be freed before continuing. I don't know if this
is a requirement or not.
>> I agree with your interpretation of SL_WRITEBACK_CACHE_DISABLED, though it's
>> inherently odd when it comes to frees. It's not clear to me how a consumer
>> could recover from "lost" frees. For the zvol implementation this is all
>> moot, but we can start with your interpretation and adapt it as necessary if
>> we ever have a backing store that honors async free semantics. Or do the
>> same but make the default always synchronous.
> We'll save that debate for UNMAP on other backing stores.
>
> BTW, I just took out the set-zle-if-not-compressing code, and found that:
>
> - My particular client (ext4 over iSCSI) sets up a lot (> 20MB) of
> zeroed blocks. Not sure why. Probably superblocks and what-not.
> Using zle compression may be useful in practice, and it might be
> why the original author set it during zvol setup.
>
> - UNMAP does do the right thing. Consider my 4MB test:
>
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME USED AVAIL REFER MOUNTPOINT
> rpool/iscsi 528M 236G 24.7M -
> everywhere(~)[0]% echo "Now we write 4MB worth of file..."
> Now we write 4MB worth of file...
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME USED AVAIL REFER MOUNTPOINT
> rpool/iscsi 528M 236G 28.7M -
> everywhere(~)[0]% echo "Let's 'rm' the file and sync..."
> Let's 'rm' the file and sync...
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME USED AVAIL REFER MOUNTPOINT
> rpool/iscsi 528M 236G 24.7M -
> everywhere(~)[0]%
>
> BTW, if I turn on zle on the dataset, the "newly-formatted disk"
> reference amount is measurable in 100s of kilobytes.
>
> everywhere(~)[0]% zfs list rpool/iscsi ; zfs get compression rpool/iscsi
> NAME USED AVAIL REFER MOUNTPOINT
> rpool/iscsi 528M 236G 114K -
> NAME PROPERTY VALUE SOURCE
> rpool/iscsi compression zle local
> everywhere(~)[0]% echo "now with zle turned on..."
> now with zle turned on...
> everywhere(~)[0]% echo "how 'bout those 4mb?"
> how 'bout those 4mb?
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME USED AVAIL REFER MOUNTPOINT
> rpool/iscsi 528M 236G 4.12M -
> everywhere(~)[0]% echo "now let's take it out again..."
> now let's take it out again...
> everywhere(~)[0]% zfs list rpool/iscsi
> NAME USED AVAIL REFER MOUNTPOINT
> rpool/iscsi 528M 236G 120K -
> everywhere(~)[0]%
>
>
> It's probably not a case for force-zle-at-a-minimum, but it may be something,
> oh, say a storage vendor, might want to turn on at its administrative
> interface level. :-)
Totally agree.
- George
> Updated the webrev:
>
> http://www.kebe.com/~danmcd/webrevs/unmap/
>
> I'm going to keep the SL_WRITEBACK_CACHE_DISABLED check, and always use the
> data vnode pointer, eliminating the last KEBE comments.
>
> Thanks,
> Dan
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
More information about the Developer
mailing list