[illumos-Developer] To sed, or not to sed...
Garrett D'Amore
garrett at damore.org
Tue Dec 14 23:57:43 PST 2010
Btw, last night I updated this webrev with a fix for another interesting
bug, which is that the FreeBSD version of regex we imported doesn't
support \< and \> word delimiters (but it does support [[:<:]] and
[[:>:]] for the same purpose.) So the fix for *that* is part of this
webrev.
The timer on this is ticking, so I'd really like to hear any feedback
soon. (Or, if someone has any reason why I shouldn't drive forward with
this change, then I'd like to hear that too.)
- Garrett
On 12/12/10 10:34 PM, Garrett D'Amore wrote:
>
> Here's my webrev, for my version:
>
> http://mexico.purplecow.org/gdamore/webrev/sed/
>
> Note that this depends on a change in libc, to enable REG_STARTEND.
>
> - Garrett
>
> On 12/12/10 09:52 PM, Garrett D'Amore wrote:
>> So one of our "closed" gaps is "sed".
>>
>> Rich Lowe and I have each independently ported FreeBSD sed to
>> illumos. There are some minor differences though, which brings me to
>> a question where I'd like to hear opinions -- preferably those backed
>> by concrete supporting evidence.
>>
>> First off, a bit of background:
>>
>> As far as I can tell, xpg4's sed implementation attempts to adhere to
>> POSIX by fully supporting multibyte characters, whereas legacy
>> /usr/bin/sed treats the file as a stream of bytes. In fact, legacy
>> sed treats the file as pure ASCII. Furthermore, legacy sed uses a
>> different output format for the "l" command ... some things are
>> escaped weird (backspaces and tabs become < and >) and a two digit
>> octal form is used. xpg4 sed uses backslash escapes for a few
>> characters (\\, \a, \b, \f, \r, \t, \v) and 3-digit octal format for
>> non-printable characters.
>>
>> So, I believe Rich's work adds support for building a separate XPG4
>> and /usr/bin version, that gives the "traditional" behavior for "l".
>> However, his version does not address the CSI problem at all.
>> (Neither does mine, since I make no attempt at providing the non-CSI
>> compliant legacy behavior.)
>>
>> IMO, this is an excellent time for us to simply ditch the legacy
>> behavior, and move to the POSIX syntax that all other OS' use . This
>> would enhance our compatibility with GNU sed, and *BSD sed. (In
>> fact, there are several other features that we will get to improve
>> such compatibility, such as -i support, regardless of which port we
>> ultimately go with.)
>>
>> I'm not emotional here. I just don't want to create integrate new
>> code to support legacy if there is no need for the legacy or if the
>> legacy hurts us more than it helps us.
>>
>> If folks really think we should retain the legacy behavior (or as
>> much of it as we can), I'm willing to go that route. Personally, I
>> *think* we may stand more to gain here by breaking with that legacy
>> and going more towards POSIX/GNU/BSD compatibility. However, I don't
>> do much with sed beyond simple scripts, and indeed I've never used
>> the "l" command. So I freely admit that someone else may have a more
>> complete picture here, and I'd like to hear more.
>>
>> If there are any sed wizards out there who have some good test
>> scripts that I can easily test (send me the script, input files, and
>> expected output), I'll be happy to verify correct functionality
>> before I push towards integration of any sed replacement.
>>
>> I'd like to have a decision, and ideally code reviews and integration
>> done, before the end of the week. So please be timely in your feedback.
>>
>> Thanks!
>>
>> - Garrett
>>
>>
>> _______________________________________________
>> Developer mailing list
>> Developer at lists.illumos.org
>> http://lists.illumos.org/m/listinfo/developer
>
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
More information about the Developer
mailing list