[illumos-Developer] Short overview about libast&co. archtecture and shell builtins... / was: Re: Closed-bin accords of the OpenSolaris conference... / was: Re: illumos_145 i386 build status

Roland Mainz roland.mainz at nrubsig.org
Mon Aug 9 16:33:15 PDT 2010


On Sun, Aug 8, 2010 at 9:42 PM, Garrett D'Amore <garrett at damore.org> wrote:
> On Sun, 2010-08-08 at 21:15 +0200, Joerg Schilling wrote:
[snip]
> As an aside, I personally find it utterly insane that ksh93 would put
> pax functionality into libast -- the ast library seems to be growing
> functionality that has less and less to do with "core shell
> functionality" and more to do with world domination of ksh93.  The end
> result is that libast is bigger for all unrelated uses (such as printf)
> because it has to carry baggage associated with pax, etc.  I do wonder
> when libast is going to grow the functionality of a c compiler, emacs,
> or X server.  IMO libast and ksh93 have lost sight of the "do one simple
> thing and do it well" philosophy of UNIX.  But I digress...

Erm... libast doesn't have any "pax"-specific functionaliy inside.
Where does the assumption come from ?

libast basically contains:
- SFIO I/O library
- AST stdio implementation (sits on top of SFIO because SFIO's API is
similar and allows easy stdio/SFIO hybrids)
- Thread and async-signal safe memory allocator system
- Functions to handle lists, trees, graphs and similar constructs
(SFIO and the ksh93 shell use this heavily)
- misc. utilty functions ranging from l10n, i18n up to math functions
(mainly there for portabilty)
- Portability wrappers (they are mainly not needed on Solaris if the
AST utilities and libraries are compiled with C99/XPG6 flags (saves
space and greatly improves performance))

Then there are other libraries:
- libdll: This library handles plugin handling (this is used by
"ksh93", "dss", "pax", "sort" (e.g. AST "sort" has plugins to handle
special sort modes, either optimised for speed or special kinds of
data (like AT&T's accounting system which has to process more than 4TB
each day, all munched by hordes of ksh93 and sort instances (see
libcoshell below))) and other utilties which have plugins)
- libsum: This librariy contains hash functions, used by many
utilities in AST and by libcmd's sum/cksum/md5sum/sha1sum/etc. builtin
utilities (on Solaris libmd is used and hardware accerlation used on
demand. I did some more performance optimisation work for the Sun
Studio compiler (this is not yet part of the upstream sources but
partially integrated into Sun's OS/Net gate))
- libcmd: This librariy hosts many of the POSIX utilities as shared
library (the intention is to minimise footprint by sharing code as
shared library (either between processes or in the future even between
threads) and to re-use it outside the shell (like the "alias" wrapper
we use in Solaris))
- libshell: This library contains the ksh93 core, for re-use by other
shared libraries (for example to implement |system()|) or by
/usr/bin/ksh93 (which is just a wrapper which directly jumps into
libshell (on Solaris it's /usr/bin/$(MACH)/ksh93 because
/usr/bin/ksh93 is a wrapper which picks a ksh93 binary which is best
for the current architecture))

In the future we would get:
- libsed: POSIX "sed" as shared library, for use in "sed", "sedcomp"
and as ksh93 builtin and to be used from the "alias" wrapper)
- libawk: POSIX "awk" as shared library, for use in "awk", "awkcomp"
and as ksh93 builtin and to be used from the "alias" wrapper)
- libcoshell: Used by ksh93 and other utilities to enable the use of
shell worker children distributed between many machines (erm... that's
the short and half-wrong description, for those who know the details:
Think "dmake in grid mode genetically spliced into shell background
(or "make"/"dmake") jobs". Note that libcoshell does not mandate any
special grid system, instead this is configureable and works with any
backend system configured).

BTW: About the keep it simple issue... technically this is kept very
simple and cleanly layered, but unfortunatelly all the shared
libraries and shell builtins are required for getting good performance
(and resource usage) out of the system (the performance boost mainly
comes ...
1) ... from the saved |fork()|+|exec()| cycle (|fork()| being obvious,
however on large SMP+NUMA machines |exec()| wheights much more since
on such machines each |exec()| makes crosscalls to all CPUs to tear
down the address space, something which already hurts a lot on Sun's
SF25k-class machines+large Niagara machines (e.g. T5440 with 256
hardware threads. Remeber that Sun had service calls (with very angry
and impatient customers) where a plain sendmail script (or similar
thing) was tearing-down a whole 2 Million Euro SF25k (Fujitsu SPARC64
is even a bit worse in this aspect and the >= 8-way AMD64 machines
don't exactly do much better in this case) down to the knees just
because it was doing a tight |fork()|+|exec()| loop. And this is not
something which is going to be fixed soon, AFAIK the only research
project Sun did failed long ago during the time Solaris 10 was being
developed (ahhgrrrl... I don't remeber the name of the project lead...
AFAIK she was the same who ran the "64k page project for the
kernel"))).
2) ... the saved startup time. There are very heavy calls like
|setlocale()| or the lookup of l10n catalogs (this becomes important
because "en_US.UTF-8" is now the default locale (which means we have
to search for catalogs... and if they are present we have to open
them, call |iconv()| if the encoding of the catalog file data does not
match the encoding of the current message locale (e.g.
en_US.UTF-8-->en_US.ISO8859-15 if LC_MESSAGES=="en_US.ISO8859-15" and
the catalog only has UTF-8 encoded messages))) which are real
performace killers and a significant time in the standalone utilities
is repeatedly spend there (without a chance to cache this information
(unless you use shell buitins))
)
Or short: The use of shell builtins simply avoids this completely and
allows caching and re-use of the code (and some data (like the l10n
catalogs)) in several interesting ways (like from different
applications, processes (and in the near future threads, too)). This
helps for embedded platforms and resource usage (e.g. disk and memory
footprint), too (this is only a side-effect but we did watch this goal
intentionally when Tim Sparlin's team and I designed the POSIX utility
modernisation + ksh93/AST integration).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)



More information about the Developer mailing list