[illumos-Developer] To sed, or not to sed...

Garrett D'Amore garrett at damore.org
Sun Dec 12 21:52:07 PST 2010


So one of our "closed" gaps is "sed".

Rich Lowe and I have each independently ported FreeBSD sed to illumos.  
There are some minor differences though, which brings me to a question 
where I'd like to hear opinions -- preferably those backed by concrete 
supporting evidence.

First off, a bit of background:

As far as I can tell, xpg4's sed implementation attempts to adhere to 
POSIX by fully supporting multibyte characters, whereas legacy 
/usr/bin/sed treats the file as a stream of bytes.  In fact, legacy sed 
treats the file as pure ASCII.  Furthermore, legacy sed uses a different 
output format for the "l" command ... some things are escaped weird 
(backspaces and tabs become < and >) and a two digit octal form is 
used.   xpg4 sed uses backslash escapes for a few characters (\\, \a, 
\b, \f, \r, \t, \v) and 3-digit octal format for non-printable characters.

So, I believe Rich's work adds support for building a separate XPG4 and 
/usr/bin version, that gives the "traditional" behavior for "l".  
However, his version does not address the CSI problem at all.  (Neither 
does mine, since I make no attempt at providing the non-CSI compliant 
legacy behavior.)

IMO, this is an excellent time for us to simply ditch the legacy 
behavior, and move to the POSIX syntax that all other OS' use . This 
would enhance our compatibility with GNU sed, and *BSD sed.  (In fact, 
there are several other features that we will get to improve such 
compatibility, such as -i support, regardless of which port we 
ultimately go with.)

I'm not emotional here.  I just don't want to create integrate new code 
to support legacy if there is no need for the legacy or if the legacy 
hurts us more than it helps us.

If folks really think we should retain the legacy behavior (or as much 
of it as we can), I'm willing to go that route.  Personally, I *think* 
we may stand more to gain here by breaking with that legacy and going 
more towards POSIX/GNU/BSD compatibility.  However, I don't do much with 
sed beyond simple scripts, and indeed I've never used the "l" command.  
So I freely admit that someone else may have a more complete picture 
here, and I'd like to hear more.

If there are any sed wizards out there who have some good test scripts 
that I can easily test (send me the script, input files, and expected 
output), I'll be happy to verify correct functionality before I push 
towards integration of any sed replacement.

I'd like to have a decision, and ideally code reviews and integration 
done, before the end of the week.  So please be timely in your feedback.

Thanks!

     - Garrett




More information about the Developer mailing list