[illumos-Developer] MUCH better --> zil for zvol TRUNCATE (and iSCSI UNMAP) working...

George Wilson gwilson at zfsmail.com
Wed Mar 2 14:40:21 PST 2011


On 3/2/11 12:36 PM, Dan McDonald wrote:
> On Wed, Mar 02, 2011 at 10:34:25AM -0800, George Wilson wrote:
>>> I dunno... zvol_replay_write() has the dmu_tx_* stuff.  If I rip it out of
>>> replay_truncate, should I also rip it out of replay_write() as well?
>> No, in the case of zvol_replay_write() you're updating the contents
>> of the zvol in one transaction so it's necessary to create the
>> transaction. In the case of zvol_replay_truncate() the
>> dmu_free_long_range() is going to create the transactions for you.
>> That's why I don't believe you need the dmu_tx_* stuff for that
>> particular op.
> In the replay, this makes sense.  (I just now noticed that dmu_write() takes
> tx as an argument.)  Naturally the DKIOCFREE code still needs all of the tx
> bits, though, so it can log to the ZIL.
>
> And combining two threads...
>
>
> On Wed, Mar 02, 2011 at 10:29:23AM -0800, George Wilson wrote:
>>>> The one thing that the txg_wait_synced() provides is for callers to
>>>> wait for all the blocks to be freed before continuing. I don't know
>>>> if this is a requirement or not.
>>> Your description of txg_wait_synced() behavior does match up with the
>>> WRITEBACK_CACHE_DISABLED, but Eric points out:
>> I see 3 slightly related controls for DKIOCFREE:
>>
>> 1). is write-cache enabled (ZVOL_WCE)?
>> 2). is sync-always enabled (ZVOL_SYNC_ALWAYS)?
> These two flags/values (it's ZFS_SYNC_ALWAYS...) appear to be already
> potentially present in zvol state.  I therefore do not need to *add* these,
> so much as check for them.
>
>> 3). do you want to wait for the frees to complete (DF_WAIT_SYNC)?
> This comes via the ioctl.
>
>> I see 1 and 2 above as ways to control whether or not a zil_commit()
>> is required as part of the free. I see #3 as way to control do you
>> want the operation to not return until the free is complete and all
>> metadata has been updated.
>>
>> My initial comments (long time ago) mentioned that this operation
>> should always be log to the zil. If we want to honor
>> SL_WRITEBACK_CACHE_DISABLED then we should make the zil_commit()
>> optional based on #1 and 2.
> Okayyyy..
>
>>> I wonder if the principal of least surprise might apply here?  That would
>>> argue for implementing DF_WAIT_SYNC by using txg_wait_synced().
>> I think there is a need for a "wait" flag and maybe there's a need to
>> have all three flags for completeness.
> Right now I have:
>
> 		error = dmu_tx_assign(tx, TXG_WAIT);
> 		if (error != 0) {
> 			dmu_tx_abort(tx);
> 		} else {
> 			zvol_log_truncate(zv, tx, df.df_start,
> 			    df.df_length, B_TRUE);
> 			error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ,
> 			    df.df_start, df.df_length);
> 			dmu_tx_commit(tx);
> 		}
>
> 		zfs_range_unlock(rl);
>
> --->		if (error == 0)
> --->			zil_commit(zv->zv_zilog, ZVOL_OBJ);
>
> Where I have arrows should probably change then?  Perhaps something like:
>
> 		if (error == 0&&  (zv->zv_flags&  ZVOL_WCE)&&
> 		zv->zv_objset->os_sync == ZFS_SYNC_ALWAYS) {
> 			zil_commit(zv->zv_zilog, ZVOL_OBJ);
> 			if (df.df_flags&  DF_WAIT_SYNC) {
> 				txg_wait_synced(dmu_objset_pool(zv->zv_objset,
> 				    0);
> 			}
> 		}

I think that it should be:

if (error == 0) {
     /*
      * If the write-cache is disabled or 'sync' property is set to 
'always' then
      * treat this as a synchronous operation (i.e. commit to zil).
      */
     if (!(zv->zv_flags & ZVOL_WCE) || (zv->zv_objset->os_sync == 
ZFS_SYNC_ALWAYS))
         zil_commit(...)

     if (df.df_flags & DF_WAIT_SYNC)
         txg_wait_synced(...)
}

- George
> Does this make sense to you?  I hope this is the last bits of tweaking I'll
> need.  I think I'm close to freezing, then I will run my tests again, and
> then final-RTI/push!
>
> Thanks,
> Dan
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer




More information about the Developer mailing list