[illumos-Developer] Request for Advice: Unicode/language expert opinions

Yuri Pankov yuri.pankov at gmail.com
Tue May 10 21:21:41 PDT 2011


On Tue, May 10, 2011 at 11:16:11PM -0500, Jason King wrote:
> On Tue, May 10, 2011 at 8:50 PM, Richard Lowe <richlowe at richlowe.net> wrote:
> 
> > >> Note that the Unicode organization does not provide CLDR data this way
> > >> -- they seem to only include the characters that make sense for the
> > >> language represented by a given localedef input file...
> > >
> > > Actually, they do provide the full case-folding data here:
> > >  http://unicode.org/Public/UNIDATA/CaseFolding.txt
> > >
> >
> > If this is a full set of case folding data, it would make sense to use
> > it, rather than data pulled from our locales to implement to*.  Not
> > least because it means we get full coverage of to* separate from full
> > locale coverage.
> >
> > Cases where it's a simple 1:1 mapping seem like they should be trivial
> > to implement for the sake of to*, cases where the mapping is not
> > reversible I'm not sure about.
> >
> > -- Rich
> >
> >
> Do we have any idea how other platforms are doing it?  What they're using as
> their input data for the mapping?

That's an example of what FreeBSD does, this looks to be compiled long
time ago and updated when needed:

http://svnweb.freebsd.org/base/head/share/mklocale/UTF-8.src?view=log

On the related note, I believe that data extracted from the *.UTF-8.src
we have should cover most (if not all) unicode ranges.


Yuri



More information about the Developer mailing list