[illumos-Developer] col(1) breaking mdoc manpages formatting in UTF8 locales

Yuri Pankov yuri.pankov at gmail.com
Thu Mar 17 22:01:23 PDT 2011


Hi,

col(1) is stripping characters (list below) from nroff(1) output due to
the iswprint() check, so characters not defined in LC_CTYPE in
<locale>.UTF-8.src are not printed. The most visible problem with this
is <MINUS_SIGN> used in Fl mdoc macro, formatting of options seems to be
severely broken. The most simple way to fix this seems to be adding the
characters listed below (most widely used, at least) to
usr/src/cmd/localedef/data/*.UTF-8.src. Sample patch attached.

And the question, of course, is - does this sound correct or should we
be fixing col(1) (not to use iswprint(), for example)?

(beware of evil unicode, LD_PRELOAD used to load libc with the debug
printfs, also in diff)

$ for i in /usr/share/man/man*/* /usr/gnu/share/man/man*/*; \
  do tbl $i 2> /dev/null | \
  neqn -Tutf8 /usr/share/lib/pub/eqnchar - 2> /dev/null | \
  nroff -u0 -Tutf8 -man - 2> /dev/null| \
  LD_PRELOAD=~/ws/illumos-gate/usr/src/lib/libc/i386/libc.so.1 \
  col -x > /dev/null; done 2>&1 | sort | uniq

char=€ code=20ac width=-1 type=0
char=ℏ code=210f width=-1 type=0
char=ℑ code=2111 width=-1 type=0
char=℘ code=2118 width=-1 type=0
char=ℜ code=211c width=-1 type=0
char=™ code=2122 width=-1 type=0
char=ℵ code=2135 width=-1 type=0
char=⅛ code=215b width=-1 type=0
char=⅜ code=215c width=-1 type=0
char=⅝ code=215d width=-1 type=0
char=⅞ code=215e width=-1 type=0
char=← code=2190 width=-1 type=0
char=↑ code=2191 width=-1 type=0
char=→ code=2192 width=-1 type=0
char=↓ code=2193 width=-1 type=0
char=↔ code=2194 width=-1 type=0
char=↕ code=2195 width=-1 type=0
char=↵ code=21b5 width=-1 type=0
char=⇐ code=21d0 width=-1 type=0
char=⇑ code=21d1 width=-1 type=0
char=⇒ code=21d2 width=-1 type=0
char=⇓ code=21d3 width=-1 type=0
char=⇔ code=21d4 width=-1 type=0
char=⇕ code=21d5 width=-1 type=0
char=∀ code=2200 width=-1 type=0
char=∂ code=2202 width=-1 type=0
char=∃ code=2203 width=-1 type=0
char=∅ code=2205 width=-1 type=0
char=∇ code=2207 width=-1 type=0
char=∈ code=2208 width=-1 type=0
char=∉ code=2209 width=-1 type=0
char=∋ code=220b width=-1 type=0
char=∏ code=220f width=-1 type=0
char=∐ code=2210 width=-1 type=0
char=∑ code=2211 width=-1 type=0
char=□ code=25a1 width=-1 type=0
char=◊ code=25ca width=-1 type=0
char=○ code=25cb width=-1 type=0
char=☜ code=261c width=-1 type=0
char=☞ code=261e width=-1 type=0
char=♠ code=2660 width=-1 type=0
char=♣ code=2663 width=-1 type=0
char=♥ code=2665 width=-1 type=0
char=♦ code=2666 width=-1 type=0
char=✓ code=2713 width=-1 type=0
char=⟨ code=27e8 width=-1 type=0
char=⟩ code=27e9 width=-1 type=0
char=¡ code=a1 width=-1 type=0
char=¢ code=a2 width=-1 type=0
char=£ code=a3 width=-1 type=0
char=¤ code=a4 width=-1 type=0
char=¥ code=a5 width=-1 type=0
char=¦ code=a6 width=-1 type=0
char=§ code=a7 width=-1 type=0
char=¨ code=a8 width=-1 type=0
char=« code=ab width=-1 type=0
char=¬ code=ac width=-1 type=0
char=® code=ae width=-1 type=0
char=¯ code=af width=-1 type=0
char=° code=b0 width=-1 type=0
char=± code=b1 width=-1 type=0
char=² code=b2 width=-1 type=0
char=³ code=b3 width=-1 type=0
char=´ code=b4 width=-1 type=0
char=µ code=b5 width=-1 type=0
char=¶ code=b6 width=-1 type=0
char=¸ code=b8 width=-1 type=0
char=¹ code=b9 width=-1 type=0
char=» code=bb width=-1 type=0
char=¼ code=bc width=-1 type=0
char=½ code=bd width=-1 type=0
char=¾ code=be width=-1 type=0
char=¿ code=bf width=-1 type=0
char=× code=d7 width=-1 type=0
char=÷ code=f7 width=-1 type=0


TIA,
Yuri



More information about the Developer mailing list