[illumos-Developer] Request for Advice: Unicode/language expert opinions

Gordon Ross gordon.w.ross at gmail.com
Tue May 10 14:57:20 PDT 2011


On Tue, May 10, 2011 at 5:41 PM, Garrett D'Amore <garrett at nexenta.com> wrote:
> Case folding in Unicode is a bit different... its about creating a case insensitive match, which is not the same as going back and forth between cases.
>
> I think Yuris approach on this is sane.
>

Yes, I understand that case folding is different than identifying
upper/lower pairs for ctype, but it's closely related for toupper
and tolower.  The data I pointed to can be used for both.

I haven't looked at Yuri's stuff yet.

I'd like to focus on the requirements first, so we don't waste Yuri's time
having him implement something that later turns in to (another) discussion
about what the correct and desired functionality should be.

In this case, the functional requirement I'd like to state is that this
implementation should map all "C" and "S" upper/lower pairs in the
case folding table.  I believe these are the ones where toupper and
tolower should implement reversible conversions.  (Correct?)

I'm slightly concerned that by compiling this data by extracting from
several locale-specific UTF-8 files, we might easily miss some.
If nothing else, the case folding table might be a way to check that
we have not missed any.  Or it could be used for implementation.

Did you see my note about the u8_textprep stuff we already have?
Are we creating undesirable duplication with that?

Gordon



More information about the Developer mailing list