[illumos-Developer] Request for Advice: Unicode/language expert opinions

Tue May 10 21:41:17 PDT 2011

It will cover all ranges that we reasonably support.  Klingon is not one of those.  ;)

Yuri Pankov <yuri.pankov at gmail.com> wrote:

>On Tue, May 10, 2011 at 11:16:11PM -0500, Jason King wrote:
>> On Tue, May 10, 2011 at 8:50 PM, Richard Lowe <richlowe at richlowe.net> wrote:
>> 
>> > >> Note that the Unicode organization does not provide CLDR data this way
>> > >> -- they seem to only include the characters that make sense for the
>> > >> language represented by a given localedef input file...
>> > >
>> > > Actually, they do provide the full case-folding data here:
>> > >  http://unicode.org/Public/UNIDATA/CaseFolding.txt
>> > >
>> >
>> > If this is a full set of case folding data, it would make sense to use
>> > it, rather than data pulled from our locales to implement to*.  Not
>> > least because it means we get full coverage of to* separate from full
>> > locale coverage.
>> >
>> > Cases where it's a simple 1:1 mapping seem like they should be trivial
>> > to implement for the sake of to*, cases where the mapping is not
>> > reversible I'm not sure about.
>> >
>> > -- Rich
>> >
>> >
>> Do we have any idea how other platforms are doing it?  What they're using as
>> their input data for the mapping?
>
>That's an example of what FreeBSD does, this looks to be compiled long
>time ago and updated when needed:
>
>http://svnweb.freebsd.org/base/head/share/mklocale/UTF-8.src?view=log
>
>On the related note, I believe that data extracted from the *.UTF-8.src
>we have should cover most (if not all) unicode ranges.
>
>
>Yuri
>
>_______________________________________________
>Developer mailing list
>Developer at lists.illumos.org
>http://lists.illumos.org/m/listinfo/developer