[illumos-Developer] initial review: bombproof taskq dispatch

Thu Jul 7 09:47:33 PDT 2011

On Thu, Jul 7, 2011 at 11:58 AM, Garrett D'Amore <garrett at damore.org> wrote:
>
> On Jul 7, 2011, at 7:21 AM, George Wilson <gwilson at zfsmail.com> wrote:
>
>> Garrett,
>>
>> A few comments:
>>
>> - With your change do we still need to use TASKQ_PREPOPULATE in spa_taskq_create()?
>
> Probably not, at least not in the kernel build.  Userland might still need or want it.
>
>> - I would really love for the #ifdef _KERNEL lines to go away before integrating this, any chance that can happen? Let me know if you need help with that.
>
> In order for that to happen, I'd have to implement the taskq_dispatch_ent() for user space in libzpool.  I guess I can make those changes.  I despise the _KERNEL ifdefs too.
>
>> - In taskq_dispatch(), lines 1136 - 1162 is it possible to make this a common routine so that both taskq_dispatch() and taskq_dispatch_ent() can both use?
>
> I thought about this, but ultimately I decided it was pointless, and would lengthen code paths.  For example, random dispatch failures should never happen for a taskq_dispatch_ent(), since that cannot (by definition) ever fail.  I decided it was ultimately cleanest and simplest to just add the dozen or so new lines of code.  Really the only code that could be shared is the code from 1299 to 1303.  The rest is unique to the specific version.  Not worth adding extra branches to share 4 lines of code, IMO.
>
>> - In taskq_thread(), line# 1582, this comment is confusing since we don't really talk about prealloc'd tasks and may get confused with prealloced taskqs which are mentioned in this file.
>
> Agreed the terminology is confusing.  Any naming suggestions?
>

"static tasks"? Although that wouldn't necessarily be correct... more
"user allocated/provided/managed".

>> I have to admit that this interface is a little funky and I'm wondering if we shouldn't try to better define this. Maybe something like taskq_ent_create() and taskq_ent_destroy() that can be called by external consumers?
>
> As indiciated earlier, I prefer to have the option to declare a taskq_ent inline in a structure.  That way we avoid having to go to the kmem allocator, ever.   Its a big performance win on hot code paths.
>

If we have cache for them it won't be as bad... but I do sort of like
the simplicity of having taskq_ent exposed.

>> - In taskq_thread(), lines 1589 - 1592, you manipulate tqe pointers without the tq_lock held, is that intentional?
>
> Yes, the entry is no longer on the global list, and so the code has exclusive ownership of the taskq_ent in question.  I could have moved the mutex_exit lower to cover this, but I'm a big fan of holding locks only as long as absolutely necessary. :-)
>
>> - does mdb's ::taskq still show the correct information with this change?
>
> *That* is an excellent question.  I think it should, but I've not looked to see if there is a custom macro.  I will investigate and get back to you.
>
>        - Garrett
>>
>> Thanks,
>> George
>>
>>
>> On 6/30/11 10:10 AM, Garrett D'Amore wrote:
>>> So, I'm getting ready to submit an RTI on this... but I'd like to give
>>> folks another, more current, chance to read the webrev.
>>>
>>> We believe this change resolves a key performance problem with highly
>>> loaded systems by reducing overall memory pressure and latency of some
>>> operations.
>>>
>>> There is a similar change to update a taskq_dispatch in the COMSTAR
>>> stack which I'll be submitting soon, which depends on this code.
>>>
>>> Please let me know if you have any feedback on this.  Thanks.
>>>
>>>    - Garrett
>>>
>>>
>>> On Wed, 2011-05-04 at 15:59 -0700, Garrett D'Amore wrote:
>>>> So one of the problems I have seen with taskq_dispatch is that it is
>>>> either prone to failure (ENOMEM), or you have to be willing to sleep.
>>>>
>>>> In fact, ZFS gets this wrong, as it uses TQ_SLEEP in a case that is
>>>> potentially called by an interrupt, so there's a latent bug there.  (And
>>>> I suspect its not the only place.)  Dealing with dispatch failures can
>>>> be a major PITA for anyone trying to dispatch from interrupt context.
>>>> This makes life miserable for certain of us who write device drivers
>>>> that have to do things like this. :-)
>>>>
>>>> But you could imagine (I did!  But this idea is used in *BSD as well) a
>>>> version that used preallocated taskq structures to have a very light
>>>> weight and totally bomb-proof implementation.  Here is an initial draft
>>>> with zio's taskqs converted to use it:
>>>>
>>>> http://mexico.purplecow.org/gdamore/webrev/taskq/
>>>>
>>>> I only converted zio although there are other opportunities.  The reason
>>>> for this is that
>>>>
>>>> a) zio's use of taskq_dispatch showed up as very hot on some significant
>>>> stressful runs in storage -- so this will immediately yield a lighter
>>>> load on CPUs
>>>>
>>>> b) zio is complex enough, that if it works for ZIO, then I can probably
>>>> make it work for a bunch of other cases.
>>>>
>>>> The drawback here is that zio's code is imported into userland, and I
>>>> didn't want to expose these internals into the userland code for now, so
>>>> I have introduced some ugly ifdefs.  I really don't like how this code
>>>> is shared with userland, but that's a battle for a different day.
>>>>
>>>> The other concern is that my design depends on zio's only being
>>>> outstanding on a single taskq at a time.  I *think* other aspects of
>>>> zio_execute rely on this assumption being true, but if I'm mistaken, I'd
>>>> really like to know it.  (I added some assertions to check for this
>>>> specifically, and I'll run a debug version in a full stress run to
>>>> double confirm.)
>>>>
>>>> Initial testing so far suggests it *works*, but I've not really hit this
>>>> hard yet, and I certainly have not done any performance analysis yet.
>>>> That will be forthcoming.
>>>>
>>>> What I want to do is give this code out for early review to give
>>>> interested parties a chance to give feedback as early into the process
>>>> as possible.
>>>>
>>>> Once this is all done, and integrated, I will probably go on a hunt to
>>>> find other likely candidates -- I *know* there are a bunch of places
>>>> that code could be simplified with this interface, and failure cases
>>>> (and associated handling) simply removed. Probably there will be some
>>>> very modest (maybe not measurable) performance benefits for those cases
>>>> as well. :-)
>>>>
>>>> Thanks.
>>>>
>>>>    - Garrett
>>>>
>>>>
>>>> _______________________________________________
>>>> Developer mailing list
>>>> Developer at lists.illumos.org
>>>> http://lists.illumos.org/m/listinfo/developer
>>>
>>>
>>> _______________________________________________
>>> Developer mailing list
>>> Developer at lists.illumos.org
>>> http://lists.illumos.org/m/listinfo/developer
>>
>>
>> _______________________________________________
>> Developer mailing list
>> Developer at lists.illumos.org
>> http://lists.illumos.org/m/listinfo/developer
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
>