[illumos-Developer] To sed, or not to sed...
Garrett D'Amore
garrett at damore.org
Sun Dec 12 21:52:07 PST 2010
So one of our "closed" gaps is "sed".
Rich Lowe and I have each independently ported FreeBSD sed to illumos.
There are some minor differences though, which brings me to a question
where I'd like to hear opinions -- preferably those backed by concrete
supporting evidence.
First off, a bit of background:
As far as I can tell, xpg4's sed implementation attempts to adhere to
POSIX by fully supporting multibyte characters, whereas legacy
/usr/bin/sed treats the file as a stream of bytes. In fact, legacy sed
treats the file as pure ASCII. Furthermore, legacy sed uses a different
output format for the "l" command ... some things are escaped weird
(backspaces and tabs become < and >) and a two digit octal form is
used. xpg4 sed uses backslash escapes for a few characters (\\, \a,
\b, \f, \r, \t, \v) and 3-digit octal format for non-printable characters.
So, I believe Rich's work adds support for building a separate XPG4 and
/usr/bin version, that gives the "traditional" behavior for "l".
However, his version does not address the CSI problem at all. (Neither
does mine, since I make no attempt at providing the non-CSI compliant
legacy behavior.)
IMO, this is an excellent time for us to simply ditch the legacy
behavior, and move to the POSIX syntax that all other OS' use . This
would enhance our compatibility with GNU sed, and *BSD sed. (In fact,
there are several other features that we will get to improve such
compatibility, such as -i support, regardless of which port we
ultimately go with.)
I'm not emotional here. I just don't want to create integrate new code
to support legacy if there is no need for the legacy or if the legacy
hurts us more than it helps us.
If folks really think we should retain the legacy behavior (or as much
of it as we can), I'm willing to go that route. Personally, I *think*
we may stand more to gain here by breaking with that legacy and going
more towards POSIX/GNU/BSD compatibility. However, I don't do much with
sed beyond simple scripts, and indeed I've never used the "l" command.
So I freely admit that someone else may have a more complete picture
here, and I'd like to hear more.
If there are any sed wizards out there who have some good test scripts
that I can easily test (send me the script, input files, and expected
output), I'll be happy to verify correct functionality before I push
towards integration of any sed replacement.
I'd like to have a decision, and ideally code reviews and integration
done, before the end of the week. So please be timely in your feedback.
Thanks!
- Garrett
More information about the Developer
mailing list