[illumos-Developer] updated webrev: localedef *plus* locale data

Garrett D'Amore garrett at nexenta.com
Wed Oct 6 23:31:21 PDT 2010


If you care about libc, the stability of illumos, or the ability to use
illumos in locales other than English, then please read this message.
I'm sorry for its length.

I've updated my workspace.  The updated review is at the end, but please
read through this message first.

I've fixed some bugs in the strxfrm() code (and made it generally
better) as well made wcsxfrm() more robust.  And I incorporated feedback
I'd already received (thanks gwr!)

Unfortunately, I don't know how to generate an incremental webrev, after
I've done hg reci.  (Perhaps I should not do this in the future until
just before I commit... but the comment logs were getting kind of ...
long.)

This review *now* includes locale data for 116 territories, with 67
different languages, and a variety of different encodings (the vast
majority is still UTF-8, and in particular only UTF-8 for CJK.)

Your favorite UTF-8 locale is probably represented here.  If it isn't
let me know, and maybe we can follow up later.

The review is *huge*, because of the data file imports.  (Some 2M lines
added.)  So I've removed the patch and PDF files (they were too big to
download remotely -- the patch was about 90M).  If you want them let me
know and I'll post them somewhere for you.  I do *not* expect people to
review the data files as they came straight from Unicode.org and CLDR.
The zh_CN source is 10M all on its own (thanks to some hairiness in the
collation rules.)

I'm really close to integration on this.  I'd like close review of the
following parts:

a) strxfrm and wcsxfrm, and the underlying code in collate.c in libc.

b) Makefile in localedef

c) if you want, the convert_map.pl file, which takes the Unicode
"MAPPING" files (8859-1.TXT, etc.), and converts them into "charmap(4)"
file suitable for processing by localedef.

d) packaging (if you can bear -- its a lot of files -- I wrote a tool to
verify that the data is correct, and I have actually installed *all* of
these locales, and done preliminary testing on most of them.  Though
since I don't speak more than a smattering of German and Spanish, the
*quality* of that testing as probably not great.  So if a packaging
expert can review one or two of the packages, that would be great.

e) because of goofiness in the way Oracle g11n delivered some of its
data, there will be some pain for OI.  Specifically, I had to do "pkg
uninstall -r system/locale"  and "pkg uninstall system/install/locale"
to add these packages.  This is necessary because of a really shoddy bit
of work where some packages delivered symbolic links in screwy places,
and IPS doesn't like it if I try to turn those symbolic links into real
directories.  Illumos locale data is *INCOMPATIBLE* with locale data
from Oracle.   If someone has any idea how to make this process work
better, please let me know.

Note that the data files come largely unmodified from Unicode.org.  (I
did add a few lines to the end of a few of the 8859*.TXT files to create
additional aliases; e.g. MINUS_SIGN is really HYPHEN-MINUS on 8859 since
we don't have a separate code point for MINUS_SIGN.  (There were fewer
than 10 of these additions taken across all the data files, and they
were only used to get past errors that occurred otherwise.)

There is a short fuse on this, despite its size.  But the key pieces
(the code itself) have largely been reviewed.  Some extra thoroughness
in the aforementioned libc bits would be helpful though.  (Note that
I've been testing this code for a bit now -- Gnome/Nautilus and Pidgin
are really excellent at finding buggy implementations of strxfrm. :-)

Oh, if anyone would prefer I *wait* before integration, for any reason
whatsoever, please let me know!  I understand that this is a fairly
risky and giant hunk of change, but I've done all I can think of to
minimize the risk, and I think adding the missing locales is critical
for illumos adoption.  (IMO it was one of the key pieces that OI needs.)

The actual review is here:
 http://mexico.purplecow.org/gdamore/webrev/localedef3/

Sorry for the size of it! :-)

	- Garrett



More information about the Developer mailing list