up2020/up2020-en-2.tex

\newpart
\section{Hilbert's 24th problem: minimalism in mathematics}\label{sec:hilbert}

In 1900, David Hilbert published a list of 23 unsolved mathematical
problems, and these problems, to be called ``Hilbert's problems'', influenced
mathematicians throughout the 20th century, and still continue to have major
influences; in 2000, one additional problem was discovered in Hilbert's notes
that was somehow excluded from the original publication, and we begin this part
with this 24th problem\cupercite{wiki:hilbert}.  What Hilbert's 24th problem
asks for is essentially a formal standard for simplicity of mathematical proofs,
and a method to demonstrate that a certain proof is the simplest for a theorem.
From this we can already see the pursuit of simplicity also exists in
mathematics, and this pursuit is in fact shared by other great mathematicians.
For example, Paul Erdős often referred to ``THE BOOK''\footnote{A real-world
``approximation'', \emph{Proofs from THE BOOK}\cupercite{wiki:thebook}, was
dedicated in 1998 to Erdős, who gave many advices during the preparation
of the book but deceased before the publication.}, an imaginary book where
God keeps the simplest proofs, and said in a lecture in 1985 that
\begin{quoting}
	You don't have to believe in God, but you should believe in THE BOOK.
\end{quoting}
It also seems that the pursuit of simplicity is not an exclusive
for masters, because Edsger W.\ Dijkstra once said:
\begin{quoting}
	How do we convince people that in programming simplicity and clarity --
	in short: what mathematicians call ``elegance'' -- are not a dispensable
	luxury, but a crucial matter that decides between success and failure?
\end{quoting}
From the quotation we know that mathematicians, in a sense, appear more
minimalist than programmers; in Section~\ref{sec:science}, we will
see an extreme example for minimalism in mathematics.

In Hilbert's original note on the 24th problem, he attempted to reduce a certain
class of proofs into a class of sequences mainly consisting of elements of a
certain type, and judge the simplicity by lengths of the sequences.  Nowadays,
from the viewpoint of programming languages, this can be generalised quite
easily\footnote{The correspondence can be formalised through \stress{Gödel
numbering}\cupercite{wiki:godelnum}, which maps different programs into
different integers; similar operations can also be done on proofs, so every
proof is formally equivalent to a program.  The converse is not true, because
a proof has to finish in finite steps, while a program does not have to;
nevertheless, since algorithms also need to end in finite steps, every algorithm
is formally equivalent to a proof.}: proofs (like programs) are after all based
on axioms (like primitive operations), and by embedding proofs for all theorems
(like implementations of library functions) a proof involves, said proof can be
reduced into a sequence of applications of the axioms, which can be counted.
With this correspondence in mind, we can \stress{``refactor'' proofs
and propositions}, just as with programs, to make them
shorter and clearer, for example
\begin{quoting}
	For discrete probability measures constrained by a set of moment
	conditions, if a probability measure exists with the (*) form,
	it must be unique and must be the maximal-entropy distribution;
	and if the maximum-entropy distribution exists, it must have
	the form of (*) and therefore must be unique.
\end{quoting}
can be refactored into
\begin{quoting}
	For discrete probability measures constrained by a set of
	moment conditions, the maximal-entropy distribution exists
	if and only if a probability measure exists with the (*) form,
	in which case they must be identical and unique.
\end{quoting}

Following the line of thought above, we should be able to construct the formal
standard for simplicity desired by Hilbert; however, what about the part about
showing a proof to be the simplest?  I guess there probably exist proofs that
cannot be demonstrated to be the simplest, and here I explain my guess with the
notion of Kolmogorov complexity, which will be further discussed in Section~%
\ref{sec:kolmogorov}.  To show a proof to be the simplest, the minimal complexity
of possible proofs probably needs to be sought, which seems to be closely
related to the Kolmogorov complexity.  However, even notwithstanding the
uncomputability of the complexity, Chaitin's incompleteness theorem shows
that there exists an upper bound for it that proofs could be shown to have
or surpass.  So what if the complexity of the theorem to be proven is
already higher than this upper bound?  Under this circumstance, even
the minimal complexity of possible proofs cannot be demonstrated,
which does not bode well for what Hilbert wanted.

\section{Boltzmann's formula: an axiom or a theorem?}\label{sec:boltzmann}

Back in an interview during the application for the postgraduate program I
got enrolled to eventually, I was asked the question ``is Boltzmann's entropy
formula, $S = k_\mathrm{B} \ln\varOmega$, an axiom or a theorem in statistical
physics?'', by one of the interviewers.  My answer, ``it can be both'', was
quite a surprise for the interviewers, judging by their reactions; I then
explained what I meant was that one set of propositions can be based on
multiple alternative basic sets of axioms, and that a proposition can be an
axiom with some bases and meanwhile a theorem with other bases.  Perhaps
because the full answer was still quite unexpected to the interviewers, which
were mainly physicists, they did not ask further; but the question should
obviously not stop at that point, and I will examine it closely in this section.

An immediate follow-up question to my answer could be ``then which choice
of axioms would be better?'', because \stress{all choices of axioms are not
equal}, although they can be ``rebased'' onto each other: for example, a given
set of axioms can be extended with a known theorem, and the resulting set of
axioms would trivially be, although redundant, as consistent as the previous
one.  Moreover, even among mutually equivalent minimal choices, some can be
clumsy to use\footnote{When I was a PhD candidate, out of curiosity about
functional programming, I attended a highly introductory report on category
theory, and the reporter mentioned some mathematicians' attempts to rebase
mathematics from set theory onto category theory.  While the effort can be
academically interesting, I sort of doubt its practical significance: from
the extremely limited parts I know, category-theoretical formulations
of fundamental mathematical notions appear more complicated than their
set-theoretical counterparts, so the former do not seem superior to the
latter according to the model in this section.}: metaphorically, although
many Turing-equivalent models of computation are available, computer scientists
would most often use Turing machines, lambda calculus and a few other models
for generic discussions, instead of cellular automata or even \verb|sed|.

\colskipa\begin{multicols}{2}
\begin{wquoting}
\begin{tikzpicture}[thm/.style = {
	align = center, font = \linespread{1}\selectfont
}, shorten <= 0.1em, shorten >= 0.1em]
	\tikzmath{\dista = 3.2\baselineskip;}
	\node (rolle) at (\dista, 4\baselineskip)
		[thm, anchor = north] {Rolle's theorem};
	\node (lagrange) at (0, 0)
		[thm, anchor = south] {Lagrange's mean\\value theorem};
	\node (cauchy) at (2 * \dista, 0)
		[thm, anchor = south] {Cauchy's mean\\value theorem};
	\draw [<-] (rolle) -- (lagrange); \draw [->] (rolle) -- (cauchy);
	\draw [->, transform canvas = {xshift = -0.5em}] (rolle) -- (lagrange);
	\draw [<-, transform canvas = {xshift = 0.5em}] (rolle) -- (cauchy);
	\draw [->] (cauchy) -- (lagrange);
\end{tikzpicture}
\end{wquoting}
\begin{wquoting}
\begin{tikzpicture}[
	prop/.style = {draw, circle, minimum size = 0.25em, inner sep = 0},
	typ/.style = {draw, rectangle, inner sep = 0, font = \scshape}
]
	\newcommand*{\mm}{\ensuremath{\!\times\!}}
	\newcommand*{\pp}{\ensuremath{\!+\!}}
	\newcommand*{\ee}{\ensuremath{\!=\!}}
	\tikzmath{
		\unit = \baselineskip; \dista = 5 * \unit;
		\lena = 1.08 * \unit; \lenb = 0.96 * \unit;
		\lenc = \dista - 2 * \lena; \distb = 0.5 * \dista;
	}
	\foreach \i/\t in {%
		0/{$6\mm6\ee36$},1/{$36\pp6\ee42$},2/{$7\mm5\ee35$},3/{$35\pp7\ee42$}%
	} {
		\node (p\i) at (0, -\i * \unit) [prop] {};
		\node at (0, -\i * \unit) [anchor = east] {\t};
	}
	\node (p4) at (\dista, -1.5 * \unit) [prop] {};
	\node (p5) at (\distb, -0.5 * \unit) [prop] {};
	\node (p6) at (\distb, -2.5 * \unit) [prop] {};
	\node at (\dista, -1.5 * \unit) [anchor = west] {$7 \mm 6 \ee 42$};
	\foreach \j/\a/\b/\c/\t in {%
		0/0/{1/4}/0.5/{\&},1/0/{1/4}/2.5/{\&},2/1/{3/4}/1.5/or%
	} {
		\node (t\j) at ({(\a + 1/2) * \lena + \b * \lenc}, - \c * \unit)
			[typ, minimum width = \lena, minimum height = \lenb] {\t};
	}
	\foreach \i/\j in {0/0,1/0,2/1,3/1,5/2,6/2} \draw [->] (p\i) -- (t\j);
	\foreach \i/\j in {4/2,5/0,6/1} \draw (p\i) -- (t\j);
\end{tikzpicture}
\end{wquoting}
\end{multicols}\colskipb
I approach the latter problem with a model based on graphs, which I call
``\stress{deduction graphs}'', with propositions as their nodes and deductions
as their edges, so the graphs are closely related to dependency graphs yet very
different from the latter: first, there are often ``equivalent'' propositions
that imply each other, so the graphs usually contain loops, unlike dependency
graphs which are acyclic (\cf~the relation between the three mean value theorems
shown in the picture above on the left); second, a proposition may well be
proven in multiple ways, so disjunctions, or in other words ``virtual nodes''
(\eg~the ``$7 \times 6 = 42$'' node in the picture above on the right),
need to be included; third, some proofs are more complex than the others,
so the edges are weighted.  Given this model, we can compare choices
of axioms based on the total path weights of the spanning trees from
the different bases, which correspond to the intuitive notion of
how complex it is to generate all the propositions.

Deduction graphs are probably impractical for quantitative use in real-world
applications, in large part due to the formal systems being infinite; but they
are very instructive in a conceptual sense, and we will refer to it multiple
times in this part.  For example, the model helps to explain the criterion
for importance of knowledge quoted in Section~\ref{sec:user}: by mastering
notions that are common (linked to many other parts in the graph) and useful
(simplifying otherwise complicated deductions), we would be able to
understand the subject matter much more efficiently.  And before concluding
this section, let's briefly come back to the original question on Boltamann's
formula: I guess that it will be a theorem with almost all reasonable
minimal choices of axioms, as long as entropy $S$ is a derived,
instead of primitive, notion in statistical physics.

\section{Kolmogorov complexity and knowledge organisation}\label{sec:kolmogorov}

In previous sections, we have scratched the surface of the issue of lower
bounds for complexities in certain scenarios multiple times: for instance,
when considering how to demonstrate the simplicity of a proof for a given
theorem (\cf~Section~\ref{sec:hilbert}), I examined the notion of minimal
complexity of possible proofs; and earlier when discussing cohesion
(\cf~Section~\ref{sec:coupling}), I defined it as the inherent coupling
between submodules, which also correlates with some sort of inevitable
complexity.  When considering the intrinsic complexities of objects,
the \stress{Kolmogorov complexity}\cupercite{wiki:kolmogorov} is
often used, and in this section we will examine it closely.

In a nutshell, the Kolmogorov complexity of an object, usually encoded in a
string, is the smallest length of computer programs that can be used to produce
it, and its value has very limited dependence on the programming language used:
the difference between complexity values from two languages has an upper bound
that depends only on the pair of languages, and not on the string in question.
The Kolmogorov complexity is an \stress{uncomputable} function, or in other
words it is provably impossible to write a program to compute it for all inputs;
furthermore, Chaitin's incompleteness theorem states that there exists an upper
bound for provable lower bounds for the complexity, which only depends the
language and the axiomatic system used for proofs, so we cannot even prove a
string is more complex than this upper bound.

You would certainly wonder, given such ``bad'' properties of the Kolmogorov
complexity, is it still meaningful?  To my knowledge, it is indeed rarely used
in practical measurements of complexity, and instead used mainly in proofs for
impossibilites, like the uncomputability of certain functions.  Nevertheless,
this does mean one thing: as can be easily seen above, the limit to information
compression\footnote{Incidentally, the Hutter Prize\cupercite{wiki:hutter},
basically a contest in lossless compression of textual human knowledge, bridges
the fields of information compression, knowledge organisation and artificial
intelligence.} is the Kolmogorov complexity, so the uncomputability of the
complexity and the unprovability of its lower bound under most circumstances
imply that there will never be a formal end to the research on information
compression.  Similar conclusions exist for other research fields, as
the term ``full employment theorem\cupercite{wiki:employ}'' summarises;
on a slightly philosophical level, while the ``bad'' properties show the
limitations on formal theories available in these fields, they also mean that
\stress{human efforts will not be easily replaced by computer programs}.

It is necessary to notice that the Kolmogorov complexity measures the intrinsic
complexity of a known object, and does not easily extends to comparison between
multiple possible but unknown objects, like the possible proofs of a theorem.
Furthermore, it is even less applicable to problems that are closely related to
information compression but are less formal, like the field of \stress{knowledge
organisation} in library science; I believe that the latter in its general
sense is the real-world counterpart to the minimisation of system complexity
in programming, and therefore has to be more or less minimalist.  However,
just like deduction graphs, even though the Kolmogorov complexity is not
quantitatively applicable to practical problems (not exactly in fact,
and we will see some indirect applications in Section~\ref{sec:ockham}),
it is still conceptually instructive because of its nature
as the \stress{descriptive complexity}.

\section{Minimalism in science and technology}\label{sec:science}

So far in this part, we have been mainly discussing minimalism in mathematics
and hypothetically axiomatic physics, but minimalism can also be observed
in other fields of science and technology, and especially in those with
\stress{considerable and tangible costs of complexity}, including both
financial and mental costs.  Traditional engineering is quite a good example
for this, because the financial cost of complexity are still obvious even
with the advancement of technology: as long as an engineering project is
under a somewhat tight budget, a simple way to do same task would usually
be preferred over a complex way, provided that the simple way does not imply
some serious trade-off.  For instance, Jon Bentley wrote in \emph{Programming
Pearls}\footnote{His column under the same name in \emph{Communications
of the ACM} was where the word frequency sorting problem (\cf~Section~%
\ref{sec:shell}) was originally discussed.} that\cupercite{bentley1999}
\begin{quoting}
	General Chuck Yeager (the first person to fly faster than
	sound) praised an airplane's engine system with the words
	``simple, few parts, easy to maintain, very strong''.
\end{quoting}

Minimalism is also an obvious pursuit by theoretical physicists:
Fred Brooks, the author of \emph{The Mythical Man-Month},
wrote in \emph{No Silver Bullet}\cupercite{brooks1987} that
\begin{quoting}
	Einstein repeatedly argued that there must be simplified
	explanations of nature, because God is not capricious or
	arbitrary.  No such faith comforts the software engineer.
\end{quoting}
This attitude reminds us of the reduction of system calls in Plan~9
(\cf~Section~\ref{sec:plan9}); similar attitudes were also evident
in other prominent figures in theoretical physics, such as Issac Newton
(noticing his \emph{Principia Mathematica}) and James Clerk Maxwell
(noticing his equations).  Minimalism in the form of elegant deductions,
comparable to elegant implementations of requirements, can also be observed:
one of my friends, who was major in mechanics, once mentioned that while many
textbooks in fluid dynamics spend lots (sometimes one or two chapters) of
text analysing gravity waves\footnote{Not to be confused with gravitational
waves in general relativity.}, Lev Landau and Evgeny Lifshitz managed the
exposition in just one section of around 10 pages in their \emph{Course of
Theoretical Physics}, which even included a few pages of exercises with answers.

The tradition to derive systematic theories from a small set of rules can surely
be traced to earlier origins, and the \emph{Elements} by Euclid is usually
considered as one among those most important original materials.  It is almost
undisputed that geometry became an indispensable part of Western education
exactly because of the \emph{Elements}, but the book had a problem with algebra:
the algebraic propositions and deductions were all written as sentences, not
formulae, perhaps due to the awkward number system at that time.  In the 20th
century, the Bourbaki group of mathematicians\footnote{Whose use of the bending
sign in warnings was the direct inspiration for Donald Knuth's ``\textdbend''
sign.} went the opposite way during the course of constructing foundations
of mathematics in a purely axiomatic way based on set theory: although their
works were indeed highly influential and productive, the contents were written
in an extremely austere and abstract way, with few explanatory remarks, a
minimal number of examples, almost no pictures, and zero real-world application.
While this style did not affect the logical correctness of Bourbaki's works,
it created problem for human readers attempting to understand the contents;
an in-depth analysis of this issue will be carried out in Section~%
\ref{sec:cognitive}, and here I would just like to discuss one more thing
that is perhaps worth some consideration in the end of this section.

When discussing the possible outcomes in case the current foundations of
mathematics were proven inconsistent\cupercite{gaillard2010}\footnote{In case
you wonder why this is even a problem, note that \stress{Gödel's incompleteness
theorems}\cupercite{wiki:godelthm} (actually the second theorem) show that these
foundations cannot prove themselves to be consistent, so the theoretical problem
does have some ground.}, it was noted that most researchers would be able to
move on with alternative foundations.  After all, set theory was born in the
1870s well after many other fields, and the set-theoretical presentations of
many results are basically restatements in a different mathematical language:
for example, V.~I.\ Arnold was able to teach group theory, up to a topological
proof for the insolubility of general quintic equations in radicals in half a
year, to secondary school students in 1963--1964 without using any axiomatics%
\cupercite{arnold1998, alekseev2004}\footnote{By the way, quite a few books
by Dan Friedman, like \emph{The Little Schemer} and \emph{The Little Prover},
achieved similar goals.}.  Similar observations also exist in physics, like
the relative independence of results in phenomenological quantum mechanics
from the actually underlying mechanisms, and that of results in macroscopic
thermodynamics from basic principles of statistical physics.  All these
prove the existence of \stress{relative separation of abstraction layers}
in formal and semiformal systems, as with software systems: from the
perspective of deduction graphs, we can convert axioms in a deduction
graph to theorems in a new graph by adding more primitive axioms, and
even though propositions that are already theorems may gain new proofs,
the overall structure of the entire graph will not change very much.

\section{Minimalism in scientific and non-scientific theories}\label{sec:ockham}

In this part, I have already mentioned multiple scientific theories,
each of which based on a minimal set of fundamental hypotheses, which
when formalised become axioms.  In last section, I noted that this tradition
is closely related to the \emph{Elements} by Euclid; I did not restrict the
tradition to science and technology, because it was actually also followed
by many philosophers and even some theologians.  One prominent example that
lies between these fields is the answer by Pierre-Simon Laplace when he was
asked by Napoleon Bonaparte about the absence of God from his five-volume
monumental work, \emph{Mécanique Céleste}: ``I had no need of that hypothesis''.
This tradition is summarised as the princple called \stress{Ockham's razor}%
\cupercite{wiki:ockham}, usually phrased as ``entities should not be multiplied
without necessity''.  It should be noted the principle is a criterion about the
choice between multiple theories when they are all consistent with available
observations and make the same predictions, and does not imply that
the simplest theory was most likely correct under all circumstances.

Multiple attempts have been made to mathematically justify Ockham's razor
using formulations related to probability, for example the theories about the
minimal message length\cupercite{wiki:mml} and the minimal description length%
\cupercite{wiki:mdl}.  Both MML and MDL took inspiration from the Kolmogorov
complexity discussed in Section~\ref{sec:kolmogorov}, but unlike the Kolmogorov
complexity which is hard to use in real-world applications, MML and MDL can
be practically used in the choice of statistical models.  The MML and MDL
approaches differ from each other quite subtly, but the general ideas remain
highly similar: somehow represent a model, and concatenate it with the
representation of observed data encoded using the model, and it can be proven
that \stress{the model with the shortest combined representation is most likely
to be correct}.  From this we can see a strong mathematical correlation with
information compression, so MML and MDL are associated with the Kolmogorov
complexity; and if we consider available observations as the data, and theories
as the models, the simplest theory that efficiently explains the observations
wins according to the result above, which is consistent with the intuitive
principle of Ockham's razor.  The result is also easily reminiscent of the
Unix philosophy in the sense of minimising the total complexity of the
system, if we compare the intended applications to the data, and
the implementations of supporting interfaces to the models.

\section{Minimalism as an aesthetic in literature and arts}\label{sec:art}

Pursuing simplicity is also an import aesthetic in literature and arts,
and in this section we will briefly examine this aesthetic, mainly through
literature.  \emph{The Elements of Style}, the prestigious American English
style guide, was listed by the \emph{Times} magazine in 2011 as one of the
100 most influential English books since 1923; William Strunk Jr., its author,
strongly emphasised ``\stress{omit needless words!}'', and made the following
recommendation on writing styles in the book\footnote{Qiao Dong translated
this 63-word maxim into 63 Chinese characters\cupercite{dongqiao1999}:
``铿然有力之文必简洁。一句之中无赘字，一段之中无赘句，犹如丹青无冗枝，机器
无废件。此说不求作者下笔句句精短，摒弃细节，概而述之；但求字字有着落耳。''}:
\begin{quoting}
	Vigorous writing is concise.  A sentence should contain no
	unnecessary words, a paragraph no unnecessary sentences, for the
	same reason that a drawing should have no unnecessary lines and
	a machine no unnecessary parts.  This requires not that the writer
	make all his sentences short, or that he avoid all details and
	treat his subject only in outline, but that every word tell.
\end{quoting}

\emph{The Elements of Style} did not only concern literary works, so what it
reflected was a common aesthetic in English writing, and a similar aesthetic
already existed in China at least as early as about the 5th century.  \emph{The
Literary Mind and the Carving of Dragons}, a book by Xie Liu who lived in
the Southern and Northern Dynasties period, was the first systematic book in
China on literary critique, and likewise considered works in all genres as
literature.  Ao Li summarised the focus of the book as two points in his
\emph{Complete Highlights of Masterpieces in Chinese Literature}:
\begin{quoting}
	One of them was to argue against the pompous style devoid of practicality,
	and the other was to advocate for the pragmatic style that ``crucial
	issues in military and politics be considered in writing''.
\end{quoting}
Shi-Jin Mou, the first secretary general of Chinese Society of
\emph{The Literary Mind}, remarked on the fundamental stances in the
book that\cupercite{moushijin1995}\footnote{It was noted in the same
reference that \emph{The Literary Mind and the Carving of Dragons}
was the result of the author's reaction to the pompous style
prevalent at that time, but isn't this document actually similar?}
\begin{quoting}
	There should be rich contents to accompany the refined forms --
	this is the absolute rule in literary creation, and is the supreme
	standard in literary critiques by Xie Liu.  This fundamental
	stance was followed throughout \emph{The Literary Mind}.
\end{quoting}

When evaluating literary works, phrases like ``piling up flowery language'',
``flashy and without substance'', ``devoid of contents'' \etc{} are never
compliments, which shows that the aesthetic above is well accepted in literature
both inside and outside China.  On the other hand, also confirming this
acceptance is the general appreciation of depictive devices that express rich
connotations with subtle details, like the practice of exhibiting dynamics or
emotions of characters with just a few strokes in painting, by both Chinese
and Western people in literature and arts.  Another example is the advocation
of ``distilling characters'' and the attention to ``eyes of poems'' in
Chinese literature, for instance the following line from a poem
\begin{quoting}
	鸟宿池边树，僧敲月下门。
\end{quoting}
is widely praised exactly because the character ``敲'' provoked the
imagination that the silence in the evening was interrupted by the sound
of knocking, and meanwhile suggested (in comparison with the character
``推'') that the person outside the door was a visiting guest instead of a
returning host.  I believe that similar examples generally exist in various
civilisations, and that the underlying aesthetic was, to a large extent,
not transmitted from other civilisations.  Just like base-10 numbers emerged
independently in multiple civilisations probably because each human has 10
fingers (``digits''), \stress{the independent origination of minimalist
aesthetics in disparate civilisations should be due to some deep reasons},
and I will explore the reasons in the next section.

\section{Minimalism from a cognitive perspective}\label{sec:cognitive}

In Section~\ref{sec:intro}, I asked, ``is simplicity now only an aesthetic?'',
and when reading here you should already know it is not, because there are
plenty of practical reasons for pursuing simplicity.  In Section~\ref{sec:art},
I noted that the independent origination of minimalist aesthetics in multiple
civilisations should be attributed to some deep origin; actually, I believe
that this aesthetic has its cognitive root, which is closely correlated with
those practical causes, and is independent of the particular civilisation.
In this section, I will discuss the cognitive basis of minimalism, and also
examine some apparent ``exceptions'' to this basis; prepared with these
analyses, we will able to proceed to the next and final two sections
in this part.  I base my argument in this section on the basic hypothesis
that \stress{most (if not all) ethics and aesthetics are results of practical
requirements in real life}.  Why do we dislike people that jump the queue when
there are many people waiting?  Because this behaviour sacrifices many people's
time and energy for the benefits of a few people, and incites people to just
wait in a crowded fashion, which reduces the throughput and may lead to safety
risks.  Why do designers dislike Comic Sans?  Because this font, when used with
anti-aliasing (which is the absolute norm today), has poor legibility\cupercite%
{kadavy2019}; this reason might seem subtle, but would appear more intuitive
if you recall that when you first learned to write, the teacher would
almost instinctly discourage illegible handwriting.

Paul Graham noted that when mathematicians and good programmers work on
problems, they hold the problems in their heads in order to fully explore the
problem scopes, so simplicity definitely matters in problem solving\cupercite%
{graham2007}; he also noted that ordinary programmers working in typical
office conditions rarely enter the state where their programs were fully loaded
into their heads.  Considering that the latter kind of programmers can still
produce mediocre programs that more or less achieve the stated goals, but that
mathematicians do not often enjoy such relaxation in requirements, this might
provide an explanation for the observation in Section~\ref{sec:boltzmann}
that mathematicians seem more minimalist than programmers.  Furtherly, I argue
that the need to hold the problem in one's head when solving problems is not
unique to mathematicians and programmers, because \stress{human are bad at
multitasking}: borrowing notions from computer science, we can say that the
cost for ``context switching'' in human mind is huge, so in order to avoid
context switching and achieve optimal performance, all relevant subproblems
of the problem has to be loaded into the head.  With simple yet powerful tools,
we will be able to make many subproblems easier to understand, which increases
our ability to hold larger problems in our heads, and I believe this is the
cognitive root of our appreciation for these tools.  V.~I.\ Arnold recalled
his instinctive reaction when taught about the intrinsic connections between
seemingly unrelated mathematical notions\cupercite{arnold1998}:
\begin{quoting}
	\emph{[... a more impressive example, omitted here for the sake of
	brevity ...]} Jacobi noted the most fascinating property of mathematics,
	that in it one and the same function controls both the presentations of
	an integer as a sum of four squares and the real movement of a pendulum.
	These discoveries of connections between heterogeneous mathematical objects
	can be compared with the discovery of the connection between electricity
	and magnetism in physics or with the discovery of the similarity in
	the geology of the east coast of America and the west coast of Africa.
\end{quoting}
Similarly, I was totally amazed when I read about the
origin of Aboriginal Linux\cupercite{landley2017}:
\begin{quoting}
	The original motivation of Aboriginal Linux was that back around
	2002 Knoppix was the only Linux distro paying attention to the
	desktop, and Knoppix's 700 megabyte live CD included the same
	100 megabytes of packages Linux From Scratch built, which
	provided a roughly equivalent command line experience as the
	1.7 megabytes tomsrtbt which was based on busybox and uClibc.
\end{quoting}

V.~I.\ Arnold's comment was meant in comparison with the style by
the Bourbaki group (\cf~Section~\ref{sec:science}), and you would probably
wonder, if simplicity is so desirable, why would untrained readers feel
uncomfortable reading mathematical books written in that style?  I base my
answer on the analysis of difference between humans and machines in Section~%
\ref{sec:mcilroy}, which on a deeper level shows that humans and machines are
respectively good at abstract and intuitive tasks.  Moreover, in order to fully
understand the subject matter, humans also need repeated stimulation that covers
its different aspects (which may also help to explain why humans are good at
intuition), and I guess this might be a product of the natural adaption to
better recognition of important knowledge, as was discussed using deduction
graphs.  All these tell us that although formal systems only need the abstract
notations of propositions and deductions, human additionally require remarks,
examples, pictures, and real-life applications to fully understand them;
for this very reason, \stress{formal abstractions and informal pictures
are mutually complementary and equally important to us}.  For example,
consider the $\epsilon$-$\delta$ definition of limits in calculus and
its informal version, ``the output can be made arbitrarily close to
the limit by setting the input sufficiently close to the specified
value'', which also captures the underlying topological nature.

There is one more issue to consider at the end of this section: in the above, I
noted that mathematicians are often not allowed to produce ``roughly passable''
results just as programmers often do, but why?  Because mathematicians usually
need to rigorously prove their results, or at least present conjectures which
they strongly suspect to be true; in comparison, programmers consider software
as engineering products, and are often willing to sacrifice provable correctness
in exhange for lower cost in development and maintenance due to the inherent
complexity of software systems\footnote{\label{fn:formal}As a matter of fact,
\stress{formal verification} is quite widely used in the hardware industry
(perhaps due to the relatively fixed sets of requirements for hardware),
but still sees very few large-scale applications in software.  A well-known
example is the formal proofs for the seL4 microkernel\cupercite{wiki:sel4},
which is one order of magnitude larger than the codebase itself, and the
latter is already very small and well-designed.}.  So when formal proofs
are too expensive, we have to resort to reasoning by ourselves,
and as Tony Hoare\cupercite{hoare1981} noted:
\begin{quoting}
	There are two ways of constructing a software design: one way is to make
	it so simple that there are obviously no deficiencies, and the other way
	is to make it so complicated that there are no obvious deficiencies.
\end{quoting}
So we know that while machines only need to know executable instructions
corresponding to programs, humans also need to somehow ensure the
\stress{correctness of these programs}; and whether the method to ensure
correctness is formal proofs or human reasoning, clear understandings
of the internal mechanisms of programs will be necessary.  Therefore
when evaluating a software system, we should not only measure its size,
but also examine the cohesion and coupling in the system, which is exactly
why I stressed the term ``cognitive complexity'' in Section~\ref{sec:complex}.

\section{Limits to application of Unix philosophy in society}\label{sec:society}

As we have seen in this part, apart from the Unix philosophy in programming,
minimalism also has its relevance in science, technology, philosophy,
literature and arts; and this is not coincidental, because minimalism is rooted
in human cognition, probably as a result of high cost for human multitasking.
A natural question after this is ``is the Unix philosophy somehow applicable
to society?'', since society is in many aspects similar to a machine, and thus
can more or less be analysed from the Unix viewpoint.  One important reason
for regarding society as a machine is that due to the division of labour,
the former can be conceptually divided into interacting groups of people,
each group doing a set of different tasks, and groups can again be divided
into subgroups doing different subtasks.  Following this line of thought,
society can be roughly compared to a system composed of interacting
modules, and therefore the Unix philosophy seems applicable.

Given this background, it is certainly beneficial to somehow induce
\stress{a proper level of separation of concerns} between different groups
(and similarly between subgroups inside a single group) in society, and it can
also be tempting to suggest extremely fine-grained division of labour such that
each person performs a very small and well-defined job mechanically.  However,
as was graphically shown in Charlie Chaplin's masterpiece \emph{Modern Times},
making people work like dumb machines would actually be against the human
nature, and would result in much more damages than benefits to society.
I believe this is one fundamental reason why society often does not work like
Unix, and this can be explained by the human need for diverse stimulation as was
discussed in last section: if all you did throughout the day was some mechanical
and monotonous job, you mind would get full of that job, and would become
increasingly estranged from other kinds of human acts.  Therefore while it is
desirable to produce minimalist works, it is also necessary to \stress{respect
the human nature} and encourage people to often try something different and
mentally challenging; from another perspective, diverse stimulation also
helps to boost people's creativity, and therefore contribute
positively to the vibrancy of the whole society.

As was emphasised just now, humans are not completely comparable to machines,
due to the human need for diverse stimulation; similarly, groups of humans
are not completely comparable to modules in computer systems, and one main
reason for this is that \stress{social groups, once formed, would pursue their
interests}, unlike software/hardware modules which just work mechanically.  For
instance, as was observed by people like Laurent Bercot\cupercite{ska:systemd},
companies naturally tend to produce mediocre software, and Paul Graham
attributed\cupercite{graham2007} this to companies wanting developers to be
replaceable; this tendency (even if it is an inadvertent effect) is destined
to be more serious for open-source companies that rely on selling support for
revenues, because it is more profitable to produce software that is hard to
maintain independently, and I seriously suspect this has been a contributing
factor in the systemd debacle (\cf~Section~\ref{sec:foss}).  From this we know
that the interactions between social groups are not purely cooperative, and just
like modules in Unix-ish high-security software systems\cupercite{djb:qmailsec},
social groups should have limited mutual trust; on this aspect, Unix and
society have reached some kind of agreement, although in different ways.

\section{Minimalism for the average person}\label{sec:worklife}

I wrote in Section~\ref{sec:user} about the essence of efficient Unix learning,
and how to realise this goal; after reading all previous sections in this part,
you should already know that these are in fact not specific to Unix, and are
instead general principles for all kinds of learning processes.  So what is
the actual role of Unix in learning?  In my opinion, the Unix philosophy
naturally fosters the minimalist habit, which is the key to efficient
learning; in this section, I will discuss how this habit can be better
nurtured, and how it can actually benefit the average person.

As I wrote before, efficient learning is basically about reducing the total
complexity of learning, and we already have a conceptual model for it --
deduction graphs (\cf~Section~\ref{sec:boltzmann}); using this model, I
analysed the criterion in Section~\ref{sec:user} for importance of knowledge,
and here I have some additions to the analysis.  We know that instead of
learning by rote\footnote{Sometimes, learning by rote can be inevitable, and in
this case the learner can consider using a flashcard program, which can often
dramatically increase the efficiency of memorisation.  Mnemosyne and Anki are
two open-source flashcard programs, but unfortunately both feel quite bloated.},
it is usually much better to remember the important points and deduce the rest
from them on demand; this means we can ``compress'' a deduction graph into the
common and useful notions therein, and then walk from these notions to the
demanded parts.  However, it must be stressed that \stress{what we remember is
not only the important notions}, but also a rough impression on how they are
related to other notions, and this impression is usually (if not always) based
on prior knowledge: for example, when learning the multiplication table,
acquaintance with addition and subtraction is very useful because it helps the
learner to associate unfamiliar results with familiar ones (\eg~$7 \times 6 =
7 \times 7 - 7 = 49 - 7 = 42$).  So for efficient learning, the prior knowledge
depended on by the above-mentioned impression is preferably something deeply
rooted in our mind, which I believe can be simply called intuition: as was
analysed in Section~\ref{sec:cognitive}, in order to fully understand something,
we need diverse stimulation, and intuition (\eg~natural numbers and their basic
operations) is a result of prior stimulation, which can help to reduce the
amount of stimulation needed for learning; this also supports the point in the
same section that informal pictures are as important as formal abstractions.

We know from the above that in order to learn efficiently, we should not only
focus on the important notions, but also try to find connetions between them
and the notions that we have already been acquainted with; we can often begin
the latter with actively \stress{identifying similarities between notions},
like ``\verb|fork()|/\verb|exec()| is like the Prototype Pattern'', and
``musical intervals between the 12 notes are similar to time intervals between
the 12 hours on a clock''.  With this practice, we would be able to use prior
knowledge to minimise the amount of information we need to remember, achieving
the effect called ``compressing the book into a booklet'' (of the essentials);
besides, when new connections between notions are discovered, understanding
of the old notions involved are also deepened, and as was noted in Section~%
\ref{sec:devel}, profound understanding about the subject matter is often the
key to elegant inventions.  On a conceptual level, after the discovery of
structural similarities between different deduction graphs, deductions from
each graph can often offer invaluable insights into previously overlooked
deductions from others; on this aspect, the analogy between bouncing macroscopic
fluid droplets and microscopic particles governed by quantum mechanics, as well
as Chinese Academician Wan-Xie Zhong's work on the computational analogy between
classical dynamics and structural mechanics, serve as two good examples.

So we have examined the correlation between multiple deduction graphs, and
considered how prior knowledge can help us to master new notions; as was noted
in the above, \stress{new knowledge can also help us to consider prior notions
in new ways}: in other words, ``reviewing the old reveals the new''; this is
exactly why we should foster the habit of revisiting old knowledge, and is why I
mentioned in Section~\ref{sec:devel} that we should often consider refactoring.
Now that the issue of habits is mentioned again, I find it appropriate to note
that both intuition and habits are results of repeated stimulation, and their
difference is that the former mainly concerns ``what something is'' (\eg~lemons
are sour) while the latter mainly concerns ``how to do something'' (\eg~how to
use some tableware).  \stress{Good intuition is highly rewarding, but often
requires a certain amount of training to acquire}: for example, a person with
adequate training in organic chemistry would immediately recognise the elegance
in the synthesis route for ``orzane'' (through ``cyclocatene'') in \parencite%
{zxhxy2018}\footnote{This is obviously not the original source, but I really
could not find one; as far as I know, this synthesis route can be at least
traced to 2014, when I drew its final step on a T-shirt after seeing it.},
but the training would not be trivial even for the most intelligent student
with the most excellent teacher.  The same holds for habits as well, which also
serves as a warning to us: as was noted in Section~\ref{sec:society}, we should
respect the human nature, but meanwhile it would be unwise to indulge ourselves
in habits that results in more damages than benefits in the long run; as a
metaphor, while slouching may feel more comfortable, it usually leads to
myopia, hunchback and pigeon chest, all of which I can attest to be unhealthy.

In the end of this part, I would like to consider an example in classical
music before presenting the final points: it is perhaps not quite unfair
to say most musicians agree that J.~S.\ Bach is the greatest composer in
the history of Western music, while L.\ van Beethoven and W.~A.\ Mozart
contend for the second place; however, almost nobody would dispute Mozart's
talent being much bigger than Bach's, but why?\footnote{A similar comparison
can perhaps be made between Albert Einstein and John von Neumann, although the
judgement on their greatness would be much more controversial.}  I think the
answer is implied in an observation by one of my schoolmates: ``Mozart always
had new musical ideas, while Bach constantly explored for new possibilities in
existent ideas''; it is perhaps due to this very reason that elements in Bach's
works are often called prototypes for musical styles that would appear as late
as the 20th century.  The point I would like to make from this example is
that \stress{while ``geniuses'' can easily think in depth, they are often not
quite motivated to think very hard or rethink old ideas}; the reason is that
there are so many sufficiently meaningful problems for them to solve, which
however leaves some obscure yet important corners untouched.  In programming,
an extremely important part of these corners is the simplification of software
systems, which is often accessible even to the average programmer as long as
he/she is determined enough; in practice this simplification is more relevant
for us, the average programmers, so in my opinion we should feel more
motivated to do it than ``geniuses'' are.  In addition, as was analysed
above, the minimalist habit lets us concentrate on the essence of problems,
which naturally gives rise to deep thinking and therefore can often provide
profound insights that are the key to difficult problems.  Sometimes, these
insights are exactly the difference between great programmers and us, and
this is perhaps another explanation for the Dennis Ritchie quotation
in Section~\ref{sec:user} -- by adhering to minimalism, even the
ordinary programmer can become a ``genius''; because of this very
reason, I conclude this part with the following quotation:
\begin{quoting}
	Common sense is instinct, and enough of it is genius.
\end{quoting}