[illumos-Developer] col(1) breaking mdoc manpages formatting in UTF8 locales

Yuri Pankov yuri.pankov at gmail.com
Thu Mar 17 23:34:21 PDT 2011


On Thu, Mar 17, 2011 at 10:04:20PM -0700, Garrett D'Amore wrote:
> On Fri, 2011-03-18 at 08:01 +0300, Yuri Pankov wrote:
> > Hi,
> > 
> > col(1) is stripping characters (list below) from nroff(1) output due to
> > the iswprint() check, so characters not defined in LC_CTYPE in
> > <locale>.UTF-8.src are not printed. The most visible problem with this
> > is <MINUS_SIGN> used in Fl mdoc macro, formatting of options seems to be
> > severely broken. The most simple way to fix this seems to be adding the
> > characters listed below (most widely used, at least) to
> > usr/src/cmd/localedef/data/*.UTF-8.src. Sample patch attached.
<snip>
> This seems reasonable to me.
> 
> Can you test this on Solars 11 by any chance?  I would like to know if
> the problem exists there as well.  I *thought* I had dome testing of
> ctype validation, but perhaps I only did that for POSIX locale.
> 
> 	- Garrett

Sure. Solaris 11 reports the chars as printable, but gives incorrect
width, so there are still problems with formatting after calling col(1)
when chars are used in manpages (I wasn't able to find any mdoc manpages
there, except for, may be, groff_mdoc(5) though).

wtest.c:
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wctype.h>

int
main(void) {
        wint_t msign = 0x2212;
        wint_t mdot = 0xb7;
        wint_t uchar1 = 0x215b;
        wint_t uchar2 = 0x215c;

        (void) setlocale(LC_ALL, "");

        printf("char=%lc code=0x%x width=%d printable=%d\n", msign, msign,
            wcwidth(msign), iswprint(msign));
        printf("char=%lc code=0x%x width=%d printable=%d\n", mdot, mdot,
            wcwidth(msign), iswprint(mdot));
        printf("char=%lc code=0x%x width=%d printable=%d\n", uchar1, uchar1,
            wcwidth(uchar1), iswprint(uchar1));
        printf("char=%lc code=0x%x width=%d printable=%d\n", uchar2, uchar2,
            wcwidth(uchar2), iswprint(uchar2));

        return (0);
}


Illumos (with change from the patch):
$ ./wtest
char=− code=0x2212 width=1 printable=32768
char=· code=0xb7 width=1 printable=32768
char=⅛ code=0x215b width=-1 printable=0
char=⅜ code=0x215c width=-1 printable=0

Solaris 11:
$ ./wtest
char=− code=0x2212 width=2 printable=32768
char=· code=0xb7 width=2 printable=32768
char=⅛ code=0x215b width=2 printable=32768
char=⅜ code=0x215c width=2 printable=32768

FreeBSD 8:
char=− code=0x2212 width=1 printable=1
char=· code=0xb7 width=1 printable=1
char=⅛ code=0x215b width=1 printable=1
char=⅜ code=0x215c width=1 printable=1


Yuri



More information about the Developer mailing list