[illumos-Developer] webrev: 650 grep support for -q would be useful

Yuri Pankov yuri.pankov at gmail.com
Tue May 3 09:47:26 PDT 2011


On Mon, May 02, 2011 at 12:22:10PM -0400, Gordon Ross wrote:
> I suspect grep may be somewhat performance-sensitive.
> Also, is our xpg4 grep up-to-date with standards?
> 
> Have you looked into the performance and compliance of:
> (a) current grep?  (b) xpg4/bin/grep?
> (c) others? i.e. FreeBSD grep?

Now that you mentioned bsdgrep, I have it ported as well, and it has
everything we need, i.e. it has the same options set as GNU grep and is
IEEE Std 1003.1-2008 (``POSIX.1'') compliant, thanks for reminding me
about it.

To answer Alan's question - yes, current grep uses regexpr(3GEN) and
xpg4/bsd greps are using regex(?) - difference is that the former
supports only BREs and current egrep has its own implementation, not
using any standard libraries.

All greps behave the same on simple test cases (below). Common problem
of the xpg4 and bsd grep is regcomp()/regexec() being really slow on
some of the EREs.

Another problem is '|%WHOANDWHERE%|', (correctly?) assumed by regex to
be invalid RE, is silently accepted by egrep and GNU's egrep.
nightly.sh has it (egrep -v '|%WHOANDWHERE%|'), but I wonder why use
egrep here at all, so the fix would be using just grep.

To sum it all up, BSD grep looks like the best choice (saves us from
having "Add yet another option for GNU grep compatibility" issues in the
future), but it needs to be "fixed" performance-wise.

Comments? :-)

TIA,
Yuri

----------------------------------------------------------------
Simple:
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 1000`; do /usr/gnu/bin/grep -q grep \
  ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf; done

real    0m7.445s
user    0m2.071s
sys     0m4.779s

XPG4 grep:
$ time for i in `seq 1 1000`; do /usr/bin/grep -q grep \
  ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf; done

real    0m5.545s
user    0m0.751s
sys     0m4.364s

BSD grep:
$ time for i in `seq 1 1000`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep \
  -q grep ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf; \
  done

real    0m6.756s
user    0m1.534s
sys     0m4.709s

old grep:
$ time for i in `seq 1 1000`; do /usr/bin/oldgrep grep \
  ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >/dev/null;
  done

real    0m6.616s
user    0m1.284s
sys     0m4.862s
----------------------------------------------------------------

----------------------------------------------------------------
Simple EREs (current grep doesn't support EREs, not tested):
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 1000`; do /usr/gnu/bin/grep -E 'awk|grep|sed' \
  ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >gnugrep.out; \
  done

real    0m17.778s
user    0m10.370s
sys     0m6.401s

old (current) grep:
$ time for i in `seq 1 1000`; do /usr/bin/oldegrep 'awk|grep|sed' \
  ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >oldegrep.out; \
  done

real    0m6.678s
user    0m1.403s
sys     0m4.935s

BSD grep (problem is in regcomp()/regexec()):
$ time for i in `seq 1 1000`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep \
 -E 'awk|grep|sed' \
 ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >bsdgrep.out; \
 done

real    0m46.824s
user    0m39.255s
sys     0m6.342s
----------------------------------------------------------------

----------------------------------------------------------------
Bad performance when given a lot of files:
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 100`; do /usr/gnu/bin/grep -q switch \
  ~/src/illumos/illumos-gate/usr/src/uts/common/io/*; done

real    0m2.621s
user    0m1.773s
sys     0m0.841s

BSD grep:
$ time for i in `seq 1 100`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep -q \
  switch ~/src/illumos/illumos-gate/usr/src/uts/common/io/*; done

real    0m11.620s
user    0m7.491s
sys     0m4.176s
----------------------------------------------------------------

----------------------------------------------------------------
Same (actually much worse) for -R/-r:
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 100`; do /usr/gnu/bin/grep -rq switch \
  ~/src/illumos/illumos-gate/usr/src/uts/common/io/; done

real    0m0.870s
user    0m0.209s
sys     0m0.642s

BSD grep:
$ time for i in `seq 1 100`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep
-rq switch ~/src/illumos/illumos-gate/usr/src/uts/common/io/; done

real    2m34.742s
user    1m48.919s
sys     0m45.556s
----------------------------------------------------------------


> On M n, May 2, 2011 at 10:25 AM, Yuri Pankov <yuri.pankov at gmail.com> wrote:
> > Hi,
> >
> > This webrev handles quite more than just adding '-q' support - it
> > actually removes cmd/egrep, cmd/fgrep, cmd/grep and makes grep_xpg4 the
> > default one. The reason for this is that egrep, fgrep and grep are all
> > different source (and quite weird one, in case of egrep) and grep_xpg4
> > provides all the functionality plus other options (such as '-q').
> > Manpages will be handled separately if this is to be accepted.
> >
> > https://www.xvoid.org/illumos/webrev/650-grep-q/
> >
> >
> > TIA,
> > Yuri



More information about the Developer mailing list