[illumos-Developer] Reaping enablings on defunct providers

Adam Leventhal ahl at delphix.com
Tue Jul 12 16:38:15 PDT 2011


Hey Bryan,

This is great stuff, and -- as you say -- something that we've all
wanted for a long long time. The code looks great. The only addition
I'd ask for is if you considered having a case with bufpolicy=ring
and/or anonymous tracing.

I had a few questions:

Can you explain the purpose of dtrace_unregister_defunct_reap? Is the
idea to try for providers that were recently defunct and then give up
after a minute?

Should the fasttrap cleanup stuff use a taskq rather than its timeout
stuff (not asking you to do it of course)?

Can you talk a little about the approach you took? Obviously if you're
using speculative tracing or bufpolicy=ring or anonymous tracing it
limits the efficacy of what you've done. Would it have been possible
to disable the ECBs without destroying them? I assume that's horribly
facile, but I'd be interested to understand.

dtrace_buffer_t structures should be cache-size aligned since they're
per-CPU structures, yes? Would that be worth noting explicitly? And
why did you elect to have padding in two places rather than just 7
64-bit values at the end?

Thanks; again, this is really great.

Adam

On Sat, Jul 2, 2011 at 11:27 AM, Bryan Cantrill <bryan at joyent.com> wrote:
> All,
>
> A longstanding problem that we have had is that enablings on defunct
> providers (e.g., USDT probes on dead processes) are not reaped:  the
> probes will exist as long as there exists an enabling for them.  When
> processes are turning over frequently (or when enablings are
> long-running), this can clog up the probe space to the point that
> DTrace probe creation will silently fail (an absolutely maddening
> failure mode).  This has been hit several times over the years (we
> were nailed by it on our build machines at Fishworks) -- so when Theo
> Schlossnagle mentioned to me that he was getting killed by this
> problem in an environment with rapidly turning over Postgres
> processes, I was embarrassed that I hadn't tackled it earlier.  As it
> turns out, it was a tad thorny for locking reasons, but a patch for
> this problem is attached.  We have integrated this into our bits at
> Joyent (internal ticket is OS-454, "enablings on defunct providers
> prevent providers from unregistering"), so you'll see this show up
> soon at http://github.com/joyent/illumos-joyent -- but I wanted to
> give everyone here a heads-up.
>
> Anyway, patch is attached, with my thanks to Adam for  a helpful
> discussion on fasttrap's asynchronous provider retiring mechanics.
> Note that Adam hasn't (yet) reviewed this, and its integration
> upstream should wait until he's had a chance to look it over. Please
> let me know if you have any questions or comments!
>
>        Thanks,
>        Bryan
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
>
>



-- 
Adam Leventhal, Delphix
http://dtrace.org/blogs/ahl

275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com



More information about the Developer mailing list