[illumos-Developer] webrev: reimplementation of od
Garrett D'Amore
garrett at damore.org
Mon Oct 18 21:00:33 PDT 2010
In order to answer some feedback questions, and address some bugs that
were pointed out in review, I've changed the implementation somewhat.
The new webrev is here: http://pastebin.ca/1966514
Thea diffs from the previous (yesterday's) version of the code are here:
http://pastebin.ca/1966521
The new logic should be simpler to read, since it just uses a circular
FIFO instead of three separate buffers. It uses power of two math for
efficiency.
I've tested this as reasonably fully as I think I can. I've tested
large files (over 4GB DVD images), and I've compared results with Sun's
od. Its bit-for-bit perfectly matched by default with Sun's.
Most other formats are also bit for bit perfect. One problem I did
discover was a bug in Sun's handling of -C output. (Closed od sometimes
puts out "000" instead of "\0", which by definition is *always* wrong
for UTF8 since "000" can't never be part of a valid UTF-8 sequence other
than \0.)
Notably, this code runs in about half the time of xpg4 with the default
options. So this is a nice performance win as well.
- Garrett
Original xpg4 version:
garrett at master{26}> /bin/time od /data/isos/sol-nv-b105-x86-dvd.iso |
md5sum
real 15:17.3
user 14:56.3
sys 18.0
495ac766a109dc3d90330ea07aa0870e -
My new version:
garrett at master{31}> /bin/time ~/od /data/isos/sol-nv-b105-x86-dvd.iso |
md5sum
real 7:17.3
user 6:58.9
sys 17.3
495ac766a109dc3d90330ea07aa0870e -
Also, here's the actual file size:
garrett at master{32}> ls -la /data/isos/sol-nv-b105-x86-dvd.iso
-rw-r--r-- 1 garrett staff 3406561280 Jan 16
2009 /data/isos/sol-nv-b105-x86-dvd.iso
Note, that my "od" was built without optimization. I expect to see a
further performance boost (about 10% probably) with optimization turned
on.
So I'll wait for further feedback, but unless someone can give me a
concrete example of where my implementation is deficient or buggy
(provide information, please -- "its broken" or "it doesn't conform to
specs" are useless comments without supporting information!), I'll be
moving forward with integration tomorrow.
- Garrett
On Mon, 2010-10-18 at 10:25 -0700, Garrett D'Amore wrote:
> On Mon, 2010-10-18 at 18:42 +0200, Joerg Schilling wrote:
> > "Garrett D'Amore" <garrett at nexenta.com> wrote:
> >
> > > So I couldn't find a suitable "od". GNU is GPL and not POSIX compliant
> > > to boot. (At least from my tests.) And the legacy Solaris one is --
> > > startlingly, amazingly bad. (If anyone can tell me what od -tfD
> > > *thinks* its doing when long doubles are 12-bytes wide, I'd love to hear
> > > an explanation. I think its just generating random garbage somehow.)
> >
> > There is my hdump(1) that could be used as a mature base framework.
> >
> > > Anyway, the new code I wrote is here:
> > >
> > > http://pastebin.ca/1964590
> > >
> > > Its not in an ON workspace yet.
> > >
> > > I've reasonably tested this as best as I can. It performs on par with
> > > legacy Sun od, and for some interesting cases runs even faster. (od -c
> > > in particular.)
> > ...
> >
> > > Assuming I get some good reviews of this, I'll probably integrate it on
> > > Monday. It compiles cleanly, so its simple to just d/l and compile if
> > > you want to try it yourself. :-)
> >
> > - Does not include largefile support
>
> Wrong. Its related to my next point. I explicitly support large files
> directly, and do not require a compiler option to enable it.
>
> > - Uses hidden intefaces not intended for the public
>
> You must mean my of *64. I'm not sure if they were intended for public
> consumption or not, but I had a specific reason in choosing this
> approach. I wanted to be guaranteed that the code was 64-bit clean, and
> not dependent on compiler options, specifically my use of strtoll() in
> argument parsing was a concern.
>
> At the end of the day, those interfaces are guaranteed to be safe,
> because they get baked into binaries compiled with -D_LARGEFILE_SOURCE,
> so I'm not worried about my explicit usage, and use of internal
> interfaces is always safe for a program targetting ON anyway.
>
> However, if you're truly offended by this usage, I'll convert to use
> off_t and -D_LARGEFILE_SOURCE.
>
> > - Does not suppot the X-Open CLI
>
> How so? I implemented my version *to* the Open Group's specification.
> If there is a discrepancy here, I'd like to fix it.
>
> > - Does not give the same default output as od(1)
>
> Whoops! Easily (trivially) fixed.
>
> > - Occasionally prints incorrect output
>
> Can you give me an example test case? I wasn't able to find any
> breakages in my testing, so I'm very interested to see a busted example.
>
> > - Occacionally prints zeroes where non-zero data is known to be
>
> Example test case please? My tests did not show this, but I'll allow
> that there could be a latent bug here.
>
> > - Does not call mbtowc() call in a standard compliant way.
>
> What are you talking about? My calls to mbtowc are exactly as specified
> by the standard.
>
> >
> > -> not ready for integration.
>
> I agree with your statement about the default output, which is a one
> line (really one *character*) change that I'll make right now. The
> other issues you've raised need further explanation.
>
> > How about enhancing hdump(1) with the od(1) interface?
>
> I've written this code already, and IMO its further along. I'm not
> interested in getting into a debate with you because you're interested
> in your program, and I'm not interested in starting over from scratch to
> with with a codebase with which I'm unfamiliar. (Also, if your code
> base depends on libschily, that's another larger effort that I'd like to
> avoid.)
>
> Of course, *you* could have picked this work up yourself. But you
> didn't. Apparently you're still irked and won't work on anything else
> until I approve star for integration. That's *your* choice, not mine.
>
> - Garrett
>
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
More information about the Developer
mailing list