[illumos-Developer] webrev: POSIX style localedef & multibyte encoding support
Garrett D'Amore
garrett at damore.org
Sun Oct 3 08:36:29 PDT 2010
On Sun, 2010-10-03 at 13:33 +0100, Owen Shepherd wrote:
> On 3 Oct 2010, at 10:08, Garrett D'Amore wrote:
>
> >
> > 1) I have added locale data for the English locales that were missing,
> > and eliminated mklocale and friends. (I removed US-ASCII without
> > replacing it. If you want 7-bit support- use POSIX or C locale.
> > Otherwise, you really want ISO-8859-1 or -15.)
> >
>
> Is the US-ASCII locale likely to be stored in any data files? Should the system perhaps automatically migrate to the POSIX or C locale if it is requested?
>
Very unlikely. I don't think Solaris ever had a US-ASCII explicit
locale, and I don't know of any situations where the locale name would
be encoded. Note that pretty much all of the locales in use on POSIX
systems use ASCII for the low 7 bits.
> (I admit my POSIX/C locale knowledge is lacking)
No worries. POSIX/C is just ASCII. The only thing is that this locale
has no specific currency symbols, since it is not tied to any
nationality.
>
> > Once I integrate this change, it will be a fairly trivial matter to add
> > support for pretty much any locale you like. :-) Any of about 372 UTF-8
> > locales are easy. Any 8859 or KOI8 locale is easy. Other encodings are
> > easy *if* I can get a character map for them. GB18030 is probably the
> > most painful of those.)
>
> Would the ICU project's character maps work? They have one for GB18030, and its a pretty important locale from a product point of view, since it is required to support it in products sold in China (ICU is released by IBM under a BSD-like license, IIRC)
Probably. I just need to get a copy of the character map. If it is not
in POSIX form, then a shell script or perl script could probably convert
it into the proper form. (Basically, I need a map from GB18030 to
Unicode.)
- Garrett
More information about the Developer
mailing list