[illumos-Developer] To sed, or not to sed...
Garrett D'Amore
garrett at damore.org
Sun Dec 12 22:34:00 PST 2010
Here's my webrev, for my version:
http://mexico.purplecow.org/gdamore/webrev/sed/
Note that this depends on a change in libc, to enable REG_STARTEND.
- Garrett
On 12/12/10 09:52 PM, Garrett D'Amore wrote:
> So one of our "closed" gaps is "sed".
>
> Rich Lowe and I have each independently ported FreeBSD sed to
> illumos. There are some minor differences though, which brings me to
> a question where I'd like to hear opinions -- preferably those backed
> by concrete supporting evidence.
>
> First off, a bit of background:
>
> As far as I can tell, xpg4's sed implementation attempts to adhere to
> POSIX by fully supporting multibyte characters, whereas legacy
> /usr/bin/sed treats the file as a stream of bytes. In fact, legacy
> sed treats the file as pure ASCII. Furthermore, legacy sed uses a
> different output format for the "l" command ... some things are
> escaped weird (backspaces and tabs become < and >) and a two digit
> octal form is used. xpg4 sed uses backslash escapes for a few
> characters (\\, \a, \b, \f, \r, \t, \v) and 3-digit octal format for
> non-printable characters.
>
> So, I believe Rich's work adds support for building a separate XPG4
> and /usr/bin version, that gives the "traditional" behavior for "l".
> However, his version does not address the CSI problem at all.
> (Neither does mine, since I make no attempt at providing the non-CSI
> compliant legacy behavior.)
>
> IMO, this is an excellent time for us to simply ditch the legacy
> behavior, and move to the POSIX syntax that all other OS' use . This
> would enhance our compatibility with GNU sed, and *BSD sed. (In fact,
> there are several other features that we will get to improve such
> compatibility, such as -i support, regardless of which port we
> ultimately go with.)
>
> I'm not emotional here. I just don't want to create integrate new
> code to support legacy if there is no need for the legacy or if the
> legacy hurts us more than it helps us.
>
> If folks really think we should retain the legacy behavior (or as much
> of it as we can), I'm willing to go that route. Personally, I *think*
> we may stand more to gain here by breaking with that legacy and going
> more towards POSIX/GNU/BSD compatibility. However, I don't do much
> with sed beyond simple scripts, and indeed I've never used the "l"
> command. So I freely admit that someone else may have a more complete
> picture here, and I'd like to hear more.
>
> If there are any sed wizards out there who have some good test scripts
> that I can easily test (send me the script, input files, and expected
> output), I'll be happy to verify correct functionality before I push
> towards integration of any sed replacement.
>
> I'd like to have a decision, and ideally code reviews and integration
> done, before the end of the week. So please be timely in your feedback.
>
> Thanks!
>
> - Garrett
>
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
More information about the Developer
mailing list