[illumos-Developer] webrev: multibyte locale collation and license cleanups

Garrett D'Amore garrett at damore.org
Fri Sep 17 20:01:04 PDT 2010


This is a big review, but its *really* only 2700 lines when you factor
out the two files that are each over 20,000 lines.  (The UnicodeData.txt
file and the generated map file from it.)

The review is here:

	 http://mexico.purplecow.org/gdamore/webrev/collate/


There are three bugs:

a) fixing the collation support for multibyte locales (this is *all*
UTF-8 locales other than English, so its quite important.)  The original
FreeBSD code lacked the necessary code.  Apple had updated their copy
(which apparently they did not copyright!) up at opensource.apple.com,
so I borrowed from that.  However, the Apple code also had a number of
Darwin-isms, as well as support for POSIX 2008.  So I borrowed the logic
from Apple that I wanted, without actually wholesale lifting the code.
Some of the logic includes stuff in libc for a real localedef, but we
mkcollate is not yet capable of generating the underlying data files
(this is for some very sophisticated collation support.)

b) I created a script and added autogeneration of the THIRDPARTYLICENSE
file all of libc.  See "extract-copyright.pl" in the review.  I also
converted a few files that Nexenta (my employer) is the sole property
owner of to CDDL 1.0, and used the new non-Oracle prototype file.  There
were a number of tiny updates to the BSD files here to move the
copyright message for Nexenta into the top of the BSD copyright, which
is the "correct" place for them to exist in a BSD licensed file (the
license explicitly refers to the "above copyright notice". :-)

c) The locale data for ctype and collate are binary data files.  It is
wasteful to use network byte ordering for them, since they are not
shared across platforms.  They are now native byte ordering.  This
should shave quite a few cycles for x86 boxes when loading these data
files.  Also, in the future, we can now consider using mmap() of these
files (that is the subject of a future RFE) to get even more
performance.


Its a big review, and I'm grateful for any feedback I can receive.

Thanks!

	- Garrett




More information about the Developer mailing list