[illumos-Developer] webrev: 650 grep support for -q would be useful
Yuri Pankov
yuri.pankov at gmail.com
Tue May 3 09:47:26 PDT 2011
On Mon, May 02, 2011 at 12:22:10PM -0400, Gordon Ross wrote:
> I suspect grep may be somewhat performance-sensitive.
> Also, is our xpg4 grep up-to-date with standards?
>
> Have you looked into the performance and compliance of:
> (a) current grep? (b) xpg4/bin/grep?
> (c) others? i.e. FreeBSD grep?
Now that you mentioned bsdgrep, I have it ported as well, and it has
everything we need, i.e. it has the same options set as GNU grep and is
IEEE Std 1003.1-2008 (``POSIX.1'') compliant, thanks for reminding me
about it.
To answer Alan's question - yes, current grep uses regexpr(3GEN) and
xpg4/bsd greps are using regex(?) - difference is that the former
supports only BREs and current egrep has its own implementation, not
using any standard libraries.
All greps behave the same on simple test cases (below). Common problem
of the xpg4 and bsd grep is regcomp()/regexec() being really slow on
some of the EREs.
Another problem is '|%WHOANDWHERE%|', (correctly?) assumed by regex to
be invalid RE, is silently accepted by egrep and GNU's egrep.
nightly.sh has it (egrep -v '|%WHOANDWHERE%|'), but I wonder why use
egrep here at all, so the fix would be using just grep.
To sum it all up, BSD grep looks like the best choice (saves us from
having "Add yet another option for GNU grep compatibility" issues in the
future), but it needs to be "fixed" performance-wise.
Comments? :-)
TIA,
Yuri
----------------------------------------------------------------
Simple:
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 1000`; do /usr/gnu/bin/grep -q grep \
~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf; done
real 0m7.445s
user 0m2.071s
sys 0m4.779s
XPG4 grep:
$ time for i in `seq 1 1000`; do /usr/bin/grep -q grep \
~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf; done
real 0m5.545s
user 0m0.751s
sys 0m4.364s
BSD grep:
$ time for i in `seq 1 1000`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep \
-q grep ~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf; \
done
real 0m6.756s
user 0m1.534s
sys 0m4.709s
old grep:
$ time for i in `seq 1 1000`; do /usr/bin/oldgrep grep \
~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >/dev/null;
done
real 0m6.616s
user 0m1.284s
sys 0m4.862s
----------------------------------------------------------------
----------------------------------------------------------------
Simple EREs (current grep doesn't support EREs, not tested):
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 1000`; do /usr/gnu/bin/grep -E 'awk|grep|sed' \
~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >gnugrep.out; \
done
real 0m17.778s
user 0m10.370s
sys 0m6.401s
old (current) grep:
$ time for i in `seq 1 1000`; do /usr/bin/oldegrep 'awk|grep|sed' \
~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >oldegrep.out; \
done
real 0m6.678s
user 0m1.403s
sys 0m4.935s
BSD grep (problem is in regcomp()/regexec()):
$ time for i in `seq 1 1000`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep \
-E 'awk|grep|sed' \
~/src/illumos/illumos-gate/usr/src/pkg/manifests/SUNWcs.mf >bsdgrep.out; \
done
real 0m46.824s
user 0m39.255s
sys 0m6.342s
----------------------------------------------------------------
----------------------------------------------------------------
Bad performance when given a lot of files:
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 100`; do /usr/gnu/bin/grep -q switch \
~/src/illumos/illumos-gate/usr/src/uts/common/io/*; done
real 0m2.621s
user 0m1.773s
sys 0m0.841s
BSD grep:
$ time for i in `seq 1 100`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep -q \
switch ~/src/illumos/illumos-gate/usr/src/uts/common/io/*; done
real 0m11.620s
user 0m7.491s
sys 0m4.176s
----------------------------------------------------------------
----------------------------------------------------------------
Same (actually much worse) for -R/-r:
----------------------------------------------------------------
GNU grep:
$ time for i in `seq 1 100`; do /usr/gnu/bin/grep -rq switch \
~/src/illumos/illumos-gate/usr/src/uts/common/io/; done
real 0m0.870s
user 0m0.209s
sys 0m0.642s
BSD grep:
$ time for i in `seq 1 100`; do ~/ws/650-grep-q/usr/src/cmd/grep/grep
-rq switch ~/src/illumos/illumos-gate/usr/src/uts/common/io/; done
real 2m34.742s
user 1m48.919s
sys 0m45.556s
----------------------------------------------------------------
> On M n, May 2, 2011 at 10:25 AM, Yuri Pankov <yuri.pankov at gmail.com> wrote:
> > Hi,
> >
> > This webrev handles quite more than just adding '-q' support - it
> > actually removes cmd/egrep, cmd/fgrep, cmd/grep and makes grep_xpg4 the
> > default one. The reason for this is that egrep, fgrep and grep are all
> > different source (and quite weird one, in case of egrep) and grep_xpg4
> > provides all the functionality plus other options (such as '-q').
> > Manpages will be handled separately if this is to be accepted.
> >
> > https://www.xvoid.org/illumos/webrev/650-grep-q/
> >
> >
> > TIA,
> > Yuri
More information about the Developer
mailing list