[illumos-Advocates] RTI 992 towlower/towupper are broken

Gordon Ross gordon.w.ross at gmail.com
Tue May 17 10:07:51 PDT 2011


My main concern here is that we don't really know where we stand w.r.t.
comparison with what other systems do, or standards compliance.
If "it's a little better" is sufficient for you, then I'll abstain.

Personally, I'd prefer to see some comparison with, say Apple's
toupper in a UTF-8 locale.  (All locales are UTF-8 on OSX, right?)
And a comparison with the full map of UTF-8 upper lower pairs
(which is a subset of the published case folding data) would be
interesting.  I suspect we're quite close to having all of them with
Yuri's proposed changes.  And perhaps a standards reference
to help us understand what POSIX calls "correct" here.

It's hard for me to imagine that any of those requests would be
hard to accomplish.  But if I'm the only one who cares about
having a complete fix here, then go ahead without me.

BTW, the localedef standard defines a "copy" operation so that,
in theory all the *.UTF-8 locales could "copy" the ctype data from
some other locale, such as en_US.UTF-8.  If we determine that
these really should be the same for all UTF-8 locales, then that
would probably be a reasonable way to accomplish it.

Gordon

On Tue, May 17, 2011 at 12:45 PM, Garrett D'Amore <garrett at damore.org> wrote:
> I would prefer to just continue forward with all of your changes.
>
> I believe that there is no key requirement that the u8_* functions match what we have here, and I recognize that the requirements for case folding are different than those for case conversion.  Furthermore, having case conversion functions for character sets for which we have no data, seems wrong.
>
> That said, one possible way to test this is to write a test program which iterates over all of utf8 space and identifies the cases where these functions have a non-identity mapping, and then check with the u8_*.   But I really think Gordon is being unduly cautious.

> Fundamentally, we need to support the POSIX standards for localedef.  This code does that.  I am very disinterested in trying to special case the character set mappings in order to artifically try to share some code.   Such sharing would break our ability to support correct POSIX localedef, and specifically would not support non-UTF-8 data.

> Gordon, what do you think, can we just let Yuri move ahead?  Certainly nobody could say his changes do anything except *improve* the current situation.

>  -- Garrett D'Amore
>
> On May 17, 2011, at 8:59 AM, Yuri Pankov <yuri.pankov at gmail.com> wrote:
>
>> On Mon, May 16, 2011 at 04:09:47PM -0400, Gordon Ross wrote:
>>> On Wed, May 11, 2011 at 7:34 PM, Yuri Pankov <yuri.pankov at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> illumos:yuri:~/ws/992-localedef$ hg outgoing -v ssh://anonhg@hg.illumos.org/illumos-gate
>>>> running ssh anonhg at hg.illumos.org "hg -R illumos-gate serve --stdio"
>>>> remote: Not trusting file /export/illumos/hgrepos/illumos-gate/.hg/hgrc from untrusted user hg, group hg
>>>> comparing with ssh://anonhg@hg.illumos.org/illumos-gate
>>>> searching for changes
>>>>
>>>> changeset:   13369:b913fe55a4c0
>>>> tag:         tip
>>>> user:        Yuri Pankov <yuri.pankov at gmail.com>
>>>> date:        Thu May 12 03:21:34 2011 +0400
>>>>
>>>> description:
>>>>        992 towlower/towupper are broken
>>>>        Reviewed by: Garrett D'Amore <garrett at damore.org>
>>>>
>>>> modified:
>>>>   usr/src/cmd/localedef/Makefile
>>>>   usr/src/cmd/localedef/ctype.c
>>>> added:
>>>>   usr/src/cmd/localedef/data/ctype.sh
>>>>
>>>> remote: Not trusting file /export/illumos/hgrepos/illumos-gate/.hg/hgrc from untrusted user hg, group hg
>>>>
>>>>
>>>> Tested by using towlower/towupper functions for latin, cyrillic and
>>>> greek characters in en_US.UTF-8 and ru_RU.UTF-8 locales - results are
>>>> the same in both.
>>> [...]
>>>
>>> Hi Yuri,
>>>
>>> Are you still working on this?
>>>
>>> I'd like to see an answer to the functionality questions about this
>>> before we integrate.  (How do we know if the fix is complete?)
>>>
>>> I suggested one way you could verify your fix.  I'm sure you could
>>> find many other ways as well.  Please choose a test method and
>>> use it to demonstrate that your fix is complete.
>>
>> Ok, let's make this just a fix for __maplower_ext excluding other
>> changes as I can't comment on the best way to provide common ctype data,
>> and, more so, on u8_* functions, which seem to be private (as well as
>> non-standard) to me - I just thought getting ctype data from locales we
>> actually support seems reasonable, but probably incorrect. I guess we
>> should continue discussing the best way to do this in the thread Garrett
>> started.
>>
>>
>> Yuri
>>
>> _______________________________________________
>> Advocates mailing list
>> Advocates at lists.illumos.org
>> http://lists.illumos.org/m/listinfo/advocates
>



More information about the Advocates mailing list