[illumos-Developer] To sed, or not to sed...

Garrett D'Amore garrett at damore.org
Sun Dec 12 22:34:00 PST 2010


Here's my webrev, for my version:

http://mexico.purplecow.org/gdamore/webrev/sed/

Note that this depends on a change in libc, to enable REG_STARTEND.

     - Garrett

On 12/12/10 09:52 PM, Garrett D'Amore wrote:
> So one of our "closed" gaps is "sed".
>
> Rich Lowe and I have each independently ported FreeBSD sed to 
> illumos.  There are some minor differences though, which brings me to 
> a question where I'd like to hear opinions -- preferably those backed 
> by concrete supporting evidence.
>
> First off, a bit of background:
>
> As far as I can tell, xpg4's sed implementation attempts to adhere to 
> POSIX by fully supporting multibyte characters, whereas legacy 
> /usr/bin/sed treats the file as a stream of bytes.  In fact, legacy 
> sed treats the file as pure ASCII.  Furthermore, legacy sed uses a 
> different output format for the "l" command ... some things are 
> escaped weird (backspaces and tabs become < and >) and a two digit 
> octal form is used.   xpg4 sed uses backslash escapes for a few 
> characters (\\, \a, \b, \f, \r, \t, \v) and 3-digit octal format for 
> non-printable characters.
>
> So, I believe Rich's work adds support for building a separate XPG4 
> and /usr/bin version, that gives the "traditional" behavior for "l".  
> However, his version does not address the CSI problem at all.  
> (Neither does mine, since I make no attempt at providing the non-CSI 
> compliant legacy behavior.)
>
> IMO, this is an excellent time for us to simply ditch the legacy 
> behavior, and move to the POSIX syntax that all other OS' use . This 
> would enhance our compatibility with GNU sed, and *BSD sed.  (In fact, 
> there are several other features that we will get to improve such 
> compatibility, such as -i support, regardless of which port we 
> ultimately go with.)
>
> I'm not emotional here.  I just don't want to create integrate new 
> code to support legacy if there is no need for the legacy or if the 
> legacy hurts us more than it helps us.
>
> If folks really think we should retain the legacy behavior (or as much 
> of it as we can), I'm willing to go that route.  Personally, I *think* 
> we may stand more to gain here by breaking with that legacy and going 
> more towards POSIX/GNU/BSD compatibility.  However, I don't do much 
> with sed beyond simple scripts, and indeed I've never used the "l" 
> command.  So I freely admit that someone else may have a more complete 
> picture here, and I'd like to hear more.
>
> If there are any sed wizards out there who have some good test scripts 
> that I can easily test (send me the script, input files, and expected 
> output), I'll be happy to verify correct functionality before I push 
> towards integration of any sed replacement.
>
> I'd like to have a decision, and ideally code reviews and integration 
> done, before the end of the week.  So please be timely in your feedback.
>
> Thanks!
>
>     - Garrett
>
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer




More information about the Developer mailing list