[illumos-Developer] Request for Advice: Unicode/language expert opinions

Yuri Pankov yuri.pankov at gmail.com
Tue May 10 21:48:54 PDT 2011


On Tue, May 10, 2011 at 11:45:58PM -0500, Jason King wrote:
> peDoghQo'

I guess that's klingon.. If that's correct, you'll need to use klingon
characters to really show that you agree or disagree :-)


Yuri

> On Tue, May 10, 2011 at 11:41 PM, Garrett D'Amore <garrett at nexenta.com>wrote:
> 
> > It will cover all ranges that we reasonably support.  Klingon is not one of
> > those.  ;)
> > *
> > *
> >
> Yuri Pankov <yuri.pankov at gmail.com> wrote:
> >
> > >On Tue, May 10, 2011 at 11:16:11PM -0500, Jason King wrote:
> > >> On Tue, May 10, 2011 at 8:50 PM, Richard Lowe <richlowe at richlowe.net>
> > wrote:
> > >>
> > >> > >> Note that the Unicode organization does not provide CLDR data this
> > way
> > >> > >> -- they seem to only include the characters that make sense for the
> > >> > >> language represented by a given localedef input file...
> > >> > >
> > >> > > Actually, they do provide the full case-folding data here:
> > >> > >  http://unicode.org/Public/UNIDATA/CaseFolding.txt
> > >> > >
> > >> >
> > >> > If this is a full set of case folding data, it would make sense to use
> > >> > it, rather than data pulled from our locales to implement to*.  Not
> > >> > least because it means we get full coverage of to* separate from full
> > >> > locale coverage.
> > >> >
> > >> > Cases where it's a simple 1:1 mapping seem like they should be trivial
> > >> > to implement for the sake of to*, cases where the mapping is not
> > >> > reversible I'm not sure about.
> > >> >
> > >> > -- Rich
> > >> >
> > >> >
> > >> Do we have any idea how other platforms are doing it?  What they're
> > using as
> > >> their input data for the mapping?
> > >
> > >That's an example of what FreeBSD does, this looks to be compiled long
> > >time ago and updated when needed:
> > >
> > >http://svnweb.freebsd.org/base/head/share/mklocale/UTF-8.src?view=log
> > >
> > >On the related note, I believe that data extracted from the *.UTF-8.src
> > >we have should cover most (if not all) unicode ranges.
> > >
> > >
> > >Yuri



More information about the Developer mailing list