[illumos-Developer] webrev: POSIX style localedef & multibyte encoding support

Garrett D'Amore garrett at damore.org
Sun Oct 3 08:36:29 PDT 2010


On Sun, 2010-10-03 at 13:33 +0100, Owen Shepherd wrote:
> On 3 Oct 2010, at 10:08, Garrett D'Amore wrote:
> 
> > 
> > 1) I have added locale data for the English locales that were missing,
> > and eliminated mklocale and friends.  (I removed US-ASCII without
> > replacing it.  If you want 7-bit support- use POSIX or C locale.
> > Otherwise, you really want ISO-8859-1 or -15.) 
> > 
> 
> Is the US-ASCII locale likely to be stored in any data files? Should the system perhaps automatically migrate to the POSIX or C locale if it is requested?
> 

Very unlikely.  I don't think Solaris ever had a US-ASCII explicit
locale, and I don't know of any situations where the locale name would
be encoded.  Note that pretty much all of the locales in use on POSIX
systems use ASCII for the low 7 bits.

> (I admit my POSIX/C locale knowledge is lacking)

No worries.  POSIX/C is just ASCII.  The only thing is that this locale
has no specific currency symbols, since it is not tied to any
nationality.

> 
> > Once I integrate this change, it will be a fairly trivial matter to add
> > support for pretty much any locale you like. :-)  Any of about 372 UTF-8
> > locales are easy.  Any 8859 or KOI8 locale is easy.  Other encodings are
> > easy *if* I can get a character map for them.  GB18030 is probably the
> > most painful of those.)
> 
> Would the ICU project's character maps work? They have one for GB18030, and its a pretty important locale from a product point of view, since it is required to support it in products sold in China (ICU is released by IBM under a BSD-like license, IIRC)

Probably.  I just need to get a copy of the character map.  If it is not
in POSIX form, then a shell script or perl script could probably convert
it into the proper form.   (Basically, I need a map from GB18030 to
Unicode.)

	- Garrett





More information about the Developer mailing list