[illumos-Developer] grep -Z etc? (decompress! was: webrev: 650 grep support for -q ...)

Kurt Lidl kurt.lidl at cello.com
Fri May 6 10:52:01 PDT 2011


On 5/6/2011 12:29 PM, Gordon Ross wrote:
> On Fri, May 6, 2011 at 9:59 AM, Kurt Lidl<kurt.lidl at cello.com>  wrote:
> [...]
> (gwr)>  Or it could just fork off a "gzip -dc" (or whatever) when it needs
>>> to decompress something.  I took a stab at that (diff attached).
> [...]
>> Forking off a copy of 'gzip -dc', and then having to copy all the compressed
>> data through a pipe to gzip, and then having to copy all the uncompressed
>> data back through the pipe to grep is slow.
> First, a clarification:  The compressed data does _not_ go through a pipe.
> The uncompress program runs with the original _file_ on FD=0 (stdin), and
> the pipe as its stdout.  The grep program reads from the other end of that
> pipe (and never writes to the pipe).  So one pipe, used only one direction.
Yes, you're correct.  I was thinking about the way that gtar does it, 
and there's code in
there that goes into a loop reading from stdin and writing to stdout 
(which is hooked up
to the fresh child tar process):

(about line 502 in src/system.c of gnutar 1.26):

   /* Check if we need a grandchild tar.  This happens only if either:
      a) we're reading stdin: to force unblocking;
      b) the file is to be accessed by rmt: compressor doesn't know how;
      c) the file is not a plain file.  */

>> Yes, I know, it's "common" to do this, but it really stinks,
>> performance-wise.
> You sure about that?
>
>> Integrating direct usage of the libgz (or libz or libbz or liblzma when the
>> time comes) is a huge performance win.
> If that's true, then there may be something wrong with our pipes...

Well, it depends on how compressed one's data is, doesn't it?  I ran 
into this
issue back at UUNET, when we had gzip'd the webserver logs, and then gave
a setgid program to the end-users to access their webserver log files.  
The first
version of that program just fork'd and called gzip on the log files.  
The log files
compressed extremely well.  The data explosion after the unzip was 
significant.
We ended up integrating libz support directly into our program and 
everything
ran much, much faster.

This was back in the day of much slower machines (think pentiums @ 100Mhz),
and stretching your available CPU mattered a lot.

In a much more recent area, the libarchive stuff that FreeBSD has 
produced can
do all this uncompression directly in the library, without having to fork a
child process.  Their "tar" program is just calls into the libarchive 
library,
and it runs very fast.

-Kurt




More information about the Developer mailing list