[illumos-Developer] col(1) breaking mdoc manpages formatting in UTF8 locales
Garrett D'Amore
garrett at damore.org
Thu Mar 17 22:04:20 PDT 2011
On Fri, 2011-03-18 at 08:01 +0300, Yuri Pankov wrote:
> Hi,
>
> col(1) is stripping characters (list below) from nroff(1) output due to
> the iswprint() check, so characters not defined in LC_CTYPE in
> <locale>.UTF-8.src are not printed. The most visible problem with this
> is <MINUS_SIGN> used in Fl mdoc macro, formatting of options seems to be
> severely broken. The most simple way to fix this seems to be adding the
> characters listed below (most widely used, at least) to
> usr/src/cmd/localedef/data/*.UTF-8.src. Sample patch attached.
>
> And the question, of course, is - does this sound correct or should we
> be fixing col(1) (not to use iswprint(), for example)?
>
> (beware of evil unicode, LD_PRELOAD used to load libc with the debug
> printfs, also in diff)
>
> $ for i in /usr/share/man/man*/* /usr/gnu/share/man/man*/*; \
> do tbl $i 2> /dev/null | \
> neqn -Tutf8 /usr/share/lib/pub/eqnchar - 2> /dev/null | \
> nroff -u0 -Tutf8 -man - 2> /dev/null| \
> LD_PRELOAD=~/ws/illumos-gate/usr/src/lib/libc/i386/libc.so.1 \
> col -x > /dev/null; done 2>&1 | sort | uniq
>
> char=€ code=20ac width=-1 type=0
> char=ℏ code=210f width=-1 type=0
> char=ℑ code=2111 width=-1 type=0
> char=℘ code=2118 width=-1 type=0
> char=ℜ code=211c width=-1 type=0
> char=™ code=2122 width=-1 type=0
> char=ℵ code=2135 width=-1 type=0
> char=⅛ code=215b width=-1 type=0
> char=⅜ code=215c width=-1 type=0
> char=⅝ code=215d width=-1 type=0
> char=⅞ code=215e width=-1 type=0
> char=← code=2190 width=-1 type=0
> char=↑ code=2191 width=-1 type=0
> char=→ code=2192 width=-1 type=0
> char=↓ code=2193 width=-1 type=0
> char=↔ code=2194 width=-1 type=0
> char=↕ code=2195 width=-1 type=0
> char=↵ code=21b5 width=-1 type=0
> char=⇐ code=21d0 width=-1 type=0
> char=⇑ code=21d1 width=-1 type=0
> char=⇒ code=21d2 width=-1 type=0
> char=⇓ code=21d3 width=-1 type=0
> char=⇔ code=21d4 width=-1 type=0
> char=⇕ code=21d5 width=-1 type=0
> char=∀ code=2200 width=-1 type=0
> char=∂ code=2202 width=-1 type=0
> char=∃ code=2203 width=-1 type=0
> char=∅ code=2205 width=-1 type=0
> char=∇ code=2207 width=-1 type=0
> char=∈ code=2208 width=-1 type=0
> char=∉ code=2209 width=-1 type=0
> char=∋ code=220b width=-1 type=0
> char=∏ code=220f width=-1 type=0
> char=∐ code=2210 width=-1 type=0
> char=∑ code=2211 width=-1 type=0
> char=□ code=25a1 width=-1 type=0
> char=◊ code=25ca width=-1 type=0
> char=○ code=25cb width=-1 type=0
> char=☜ code=261c width=-1 type=0
> char=☞ code=261e width=-1 type=0
> char=♠ code=2660 width=-1 type=0
> char=♣ code=2663 width=-1 type=0
> char=♥ code=2665 width=-1 type=0
> char=♦ code=2666 width=-1 type=0
> char=✓ code=2713 width=-1 type=0
> char=⟨ code=27e8 width=-1 type=0
> char=⟩ code=27e9 width=-1 type=0
> char=¡ code=a1 width=-1 type=0
> char=¢ code=a2 width=-1 type=0
> char=£ code=a3 width=-1 type=0
> char=¤ code=a4 width=-1 type=0
> char=¥ code=a5 width=-1 type=0
> char=¦ code=a6 width=-1 type=0
> char=§ code=a7 width=-1 type=0
> char=¨ code=a8 width=-1 type=0
> char=« code=ab width=-1 type=0
> char=¬ code=ac width=-1 type=0
> char=® code=ae width=-1 type=0
> char=¯ code=af width=-1 type=0
> char=° code=b0 width=-1 type=0
> char=± code=b1 width=-1 type=0
> char=² code=b2 width=-1 type=0
> char=³ code=b3 width=-1 type=0
> char=´ code=b4 width=-1 type=0
> char=µ code=b5 width=-1 type=0
> char=¶ code=b6 width=-1 type=0
> char=¸ code=b8 width=-1 type=0
> char=¹ code=b9 width=-1 type=0
> char=» code=bb width=-1 type=0
> char=¼ code=bc width=-1 type=0
> char=½ code=bd width=-1 type=0
> char=¾ code=be width=-1 type=0
> char=¿ code=bf width=-1 type=0
> char=× code=d7 width=-1 type=0
> char=÷ code=f7 width=-1 type=0
This seems reasonable to me.
Can you test this on Solars 11 by any chance? I would like to know if
the problem exists there as well. I *thought* I had dome testing of
ctype validation, but perhaps I only did that for POSIX locale.
- Garrett
>
>
> TIA,
> Yuri
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
More information about the Developer
mailing list