## Fractals and Monads – Part 2

30 January 2019

Last time I discussed the power set monad and how it shows up in certain methods of building fractals.  This time I’ll go a bit further, and also explain some Haskell code I actually use to generate fractal pictures like this:

Recall we were considering fractals that can be built using a “replication” function

$f\colon X \to \mathcal{P}X$

on some set of “shapes” $X$.  As an example, we had considered a map that takes any triangle in the plane to sets of three new triangles as shown here:

and we noted that “iterating” this map indefinitely gives us all of the triangular “holes” in a Sierpinski triangle:

I also described how iterating this map, if we do everything carefully, involves the structure of the power set monad.

In fact, considering maps like $f$ leads us directly to the “Kleisli category” associated to the power set monad.  I want to explain this here, because in computer science people actually tend to think of monads essentially in terms of their associated Kleisi categories, and that’s how monads are implemented in Haskell.

Every monad has an associated Kleisli category, but since I’ve been working with the power set monad $(\mathcal{P}, \eta, \mu)$ in the category of sets, I’ll start with that special case.

First, in the Kleisli category, which we’ll denote $\mathrm{Set}_\mathcal{P}$, the objects are the same as the original category $\mathrm{Set}$: they’re just sets.  A morphism from the set $X$ to the set $Y$, however, is a function

$f\colon X \to \mathcal{P}Y$

For now, at least, I’ll avoid making up any special notation for morphisms in the Kleisli category, and instead freely regard $f\colon X \to \mathcal{P}Y$ either as a morphism from $X$ to $\mathcal{P}Y$ in the category $\mathrm{Set}$ of sets, or as a morphism from $X$ to $Y$ in the Kleisli category $\mathrm{Set}_\mathcal{P}$; these are really the same thing.  If this is confusing, whenever you’re thinking of $f$ as a morphism in the Kleisli category, just think of the $\text{}\mathcal{P}\text{''}$ as decoration on the arrow , rather than on the codomain object $Y$.

Given two functions $f\colon X \to \mathcal{P} Y$ and $g\colon Y \to \mathcal{P} Z$, the Kleisli composite $g\bullet f$ is defined by

$g\bullet f := \mu_Z\circ \mathcal{P} g\circ f$

Checking the domain and codomain of each of the functions on the right, we can see that this indeed gives a morphism from $X$ to $Z$ in the Kleisli category:

$\qquad \qquad X \stackrel f \longrightarrow \mathcal{P} Y \stackrel {\mathcal{P} g} \longrightarrow \mathcal{P} \mathcal{P} Z \stackrel {\mu_Z} \longrightarrow \mathcal{P} Z$

You can check that $\bullet$ is associative, and that the maps $\eta_X\colon X \to \mathcal{P} X$ are identity morphisms, so we really do get a category. That’s all there is to the Kleisli category.

Returning to the sort of function we had been considering for generating fractals:

$f\colon X \to \mathcal{P}X$

these are clearly just endomorphisms in the Kleisli category.  This also makes it clear what we mean by “iterating” such a function: it’s an endomorphism, so we can just compose it with itself as many times as we want.  In particular, analogously to how we define “powers” of ordinary functions by repeated composition, we can define the nth Kleisli power of $f$ recursively by

$\displaystyle f^{{\bullet}n} = \left\{\begin{array}{cc}\eta_X & n=0\\ f \bullet f^{{\bullet}(n-1)}& n\ge 1 \end{array}\right.$

That is, the 0th power of $f$ is the identity, while the nth power of is the Kleisli composite of $f$ with its (n-1)st power.

For example, consider the set $T$ of all equilateral triangles in the plane and consider the map $f\colon T \to \mathcal{P}T$ defined so that $f(t)$ has six elements, each 1/3 the size of, and linked together around the boundary of the original triangle $t$, as shown here:

Then, starting with an arbitrary triangle $t \in T$ the images under successive Kleisli powers of the generating endomorphism:

$f^{\bullet 0}(t), f^{\bullet 1}(t), f^{\bullet 2}(t), f^{\bullet 3}(t), f^{\bullet 4}(t), \ldots$

look like this:

Let’s do some Haskell—If you don’t know Haskell, don’t worry; I’ll explain what everything means. If you do know Haskell, this will be super easy, perhaps even obvious, though I may write things in ways you won’t expect, since I’m transitioning over from the categorical definition of monad, rather than diving in with how monads are actually defined in Haskell.

The code we really need is almost embarrassingly short, mainly because monads are extremely well suited to the task at hand.  What we need most is just to be able to compute Kleisli powers!

Here’s how the formula for the Kleisli power of an endomorphism becomes a function in Haskell:

import Control.Monad

kpow :: Monad m => Int -> (a -> m a) -> a -> m a
kpow 0 _ = return
kpow n f = f <=< kpow (n-1) f


Let’s go through this line by line.

The first line just loads some tools for working with monads; there’s not much I need to say about that now.

The next line is the first one of real content; it’s the “type signature” of the function “kpow”.  The first bit, Monad m, just says that m is any monad; Haskell knows about several monads, and you can also define your own.   Then, the rest of that line:

Int -> (a -> m a) -> a -> m a

can be interpreted as declaring that we have a function that takes one integer (Int), the “power,” and one element of $\hom(a, ma)$, (i.e. something of type a -> m a), and gives us another element of $\hom(a, ma)$.  So if I were to write this line in standard mathematical notation, I would write:

$\mathrm{kpow}:\mathbb{N} \times \hom(a, ma) \to \hom(a, ma)$

where $m$ is any monad, and $a$ is any type.  We take an integer and a Kleisli endomorphism for the monad m, and return another endomorphism. Notice there’s a slight discrepancy here, in that an Int is any integer, whereas I used the natural numbers $\mathbb{N}$.  But the integer really is intended to be non-negative.  If you fed a negative integer into kpow, the recursion would never terminate; so, my code doesn’t forbid it, but you shouldn’t do it.

The last two lines of the code give the actual formula for the Kleisli power. If you just look at my earlier formula:

$\displaystyle f^{{\bullet}n} = \left\{\begin{array}{cc}\eta_X & n=0\\ f \bullet f^{{\bullet}(n-1)}& n\ge 1 \end{array}\right.$

and compare this to these two lines:

kpow 0 _ = return
kpow n f = f <=< kpow (n-1) f


you can probably figure out all of the Haskell syntax just by matching up symbols.

The first line says “whenever the first argument of kpow is 0, the result is “return.” This may be confusing if you’re used to how the keyword “return” is used in many other programming languages. Sorry; it’s not my fault! In Haskell “return” means “$\eta_x$ for any object $x$“, where the objects in this case are always data types.

The second line says “the nth Kleisli power of f is the composite of f with the (n-1)st power of f”, just as in the other formula. You probably guessed that the Kleisli composite $f \bullet g$ is written f <=< g in Haskell.

Here’s a dictionary relating the notation I’ve been using to Haskell notation for the same thing:

I didn’t need to use $\mu$, a.k.a. join explicitly in the definition of Kleisli powers, since it’s already wrapped into the definition of Kleisli composition.

To do some examples I’ll use the list monad, which is a built-in monad in Haskell. This is very much like the power set monad, since “lists” are like sets in many ways, so I can introduce this pretty quickly. We really just need a very basic idea of how the list monad works.

A “list” of elements of “type” $a$ can be written written simply as

$[x_1, x_2, \ldots x_n] \quad \text{where } x_i\in a$

for a finite list, or $[x_1, x_2, \ldots]$ for an infinite list.

Making lists of elements of a given type clearly gives a functor analogous to the power set functor: for any function $f\colon a \to b$ we get a function that just acts element-wise on lists:

$([x_1,x_2,\ldots, x_n]) \mapsto [f(x_1),f(x_2), \ldots f(x_n)]$

mapping a list of elements of type $a$ into a list of elements of type $b$.

Based on what we’ve done with sets, you can probably guess how the two natural transformations in the list monad work, too.

First, any single item can be regarded as a list with only one item on it, so

$\eta_a(x) = [x]$

In Haskell, for example, you could take an element of type Int, say the integer 5, and turn it into a list of integers by using return 5. As long as Haskell has a context to know it is working in the list monad rather than some other monad, this will output [5]. Voilà!

Second, the natural transformation $\mu$ turns lists of lists into lists.  If you’ve got a list of to-do lists, you can always turn it into a single list by first writing down the items on the first list, in order, then continuing with the items on the second list, and so on.

In Haskell, If you enter join [[1,2,3],[4,5]], you’ll get [1,2,3,4,5] as output. That’s $\mu$.

There’s one noteworthy subtlety, which is that infinite lists are also allowed. So, if you have a list of lists:

[Fun stuff to do, Work to do]

and the first item happens to be an infinite list, then if you try joining these lists up into one in the way I described, you’ll never get to any items on the second list. This is one way $\mu$ in the list monad differs from $\mu$ in the power set monad: In the former, as soon as one of the lists is infinite, everything after that gets thrown away. This last part is not obvious; you certainly could devise ways to join a list of infinite lists together and include all of the items, but that’s not how things work in this monad.

You can easily work out how Kleisli composition works from this, so we’ve now got everything we need.

## Cantor set in the List Monad.

Let’s do an example: the Cantor set. I’ll represent closed intervals on the real line by ordered pairs in Haskell. So, when I write (0.3, 1.5), I mean the set $\{x\in \mathbb{R} : 0.3 \le x \le 1.5\}$. I know, round brackets for closed intervals is weirdly non-standard, but in Haskell, square brackets are reserved for lists, and it’s better to keep things straight.

To generate the Cantor set, all we need is the Kleisli endomorphism in the list monad that describes the general step. This can be defined by a one-line Haskell function:
 f (p,q) = [(p, p + (q-p)/3), (q-(q-p)/3, q)] 
If we apply this to the unit interval (0, 1), here’s what we get:
 [(0.0,0.3333333333333333),(0.6666666666666667,1.0)] 
So this works: it’s mapping the closed interval $[0,1]$ to a list of two closed intervals: $[0, 1/3]$ and $[2/3, 1]$.

Now we can compute any step in the construction of the Cantor set by computing Kleisli powers. Let’s just do one. Once you’ve loaded the function kpow, enter this in Haskell:
 kpow 3 f (0,1) 
and—kapow!—you’ll get this output:

[(0.0,3.7037037037037035e-2),
(7.407407407407407e-2,0.1111111111111111),
(0.2222222222222222,0.25925925925925924),
(0.2962962962962963,0.3333333333333333),
(0.6666666666666667,0.7037037037037037),
(0.7407407407407408,0.7777777777777778),
(0.888888888888889,0.9259259259259259),
(0.962962962962963,1.0)]


Ugly isn’t it? But convert to fractions and you’ll see this is doing just the right thing.

I’ll do more examples in the next post.

## Lunar Eclipse

20 January 2019

Somehow, no matter how many times I’ve seen it, I’m still excited whenever I get to see this show. Here’s the view from Colorado Springs right now:

18 January 2019

A while back, I posted a bunch of fractal artwork that I did for fun, based on variations of the Koch snowflake; like this one, for example:

This is composed entirely of equilateral triangles, but it’s just one step in an iterated process leading to a fractal consisting of infinitely many Koch snowflakes at ever smaller scales.

I had a lot of fun coming up with designs like this, but actually I didn’t start out wanting to do new things with the Koch snowflake—that was just a fun tangent.  Originally, I was thinking about monads!

It’s high time I got back to what I was originally working on and explained what these fractals have to do with the “power set monad.”  To do that, I should probably start by explaining what the power set monad is.

For any set $X$ the power set $\mathcal{P} X$ is the set of all its subsets.  If you’ve ever studied any set theory, this should be familiar.  Unfortunately, a typical first course in set theory may not go into the how power sets interact with functions; this is what the “monad” structure of power sets is all about. This structure has three parts, each corresponding to a very common and important mathematical construction, but also so simple enough that we often use it without even thinking about it.

Let me finally say what the monad structure of power sets actually is.

First, mapping sets to their power sets is actually a functor:  For any function $f\colon X \to Y$, there corresponds a function

$\mathcal{P}f\colon \mathcal{P}X \to \mathcal{P}Y$

which sends each element $S \in \mathcal{P}X$, i.e. each subset $S \subseteq X$, to its image:

$(\mathcal{P}f)(S) = \{ f(s) : s \in S\}$.

This makes $\mathcal{P}$ a functor because the mapping of functions gets along nicely with identity functions and composition.   This functor is the first part of the power set monad.

You might be accustomed to writing the image of the set $S$ under $f$ simply as $\text{}f(S)\text{''}$, rather than $\mathcal{P}f(S)$.  But here, we need to be more careful: despite its frequent use in mathematics, $\text{}f(S)\text{''}$ is a notational abuse for which $\mathcal{P}f(S)$ is the proper form.

To motivate the next piece in the power set monad, notice that there’s a strong sense in which $\mathcal{P}f$ is just an extension of $f$, namely, on singleton sets, it essentially “is” $f$:

$\mathcal{P}f(\{ x \}) = \{f(x)\}$.

This might make it tempting write $\mathcal{P}f( x ) = f(x)$ for $x \in X$, as sort of “dual” version of the previous abuse of notation, though I don’t think anyone does that.

A better way to express how $\mathcal{P}f$ extends $f$ is to introduce the canonical map sending elements of $X$ to the corresponding singleton sets:

$\displaystyle \begin{array}{rcl} \eta_X \colon X & \to & \mathcal{P}X \\ x & \mapsto & \{x\} \end{array}$

Then, the previous equation can be written $\mathcal{P}f \circ \eta_X = \eta_Y \circ f$, which is precisely the equation that makes $\eta$ a natural transformation. This natural transformation is the second piece in the monad structure.

Anyone who’s taught elementary set theory can attest that some students have a hard time at first making with conceptual distinction between elements and singleton subsets of a set.  Indeed, conflating the two is a very “natural” thing to do, which is why this is a natural transformation.

The last piece of the power set monad is another natural transformation. Much like $\eta$, which formalizes a way to think of elements as being subsets, there is a natural transformation that lets us think of sets of subsets as being subsets.  This is given by, for each set $X$, a map:

$\displaystyle \begin{array}{rcl} \mu_X \colon \mathcal{P}^2X & \to & \mathcal{P}X \\ S & \mapsto & \bigcup_{U\in S} U \end{array}$

Here $\mathcal{P}^2X$ is the power set of the power set of $X$, so an element of it is a sets of subsets of $X$, and we turn this into a single subset by taking the union.

That’s it!  Or, almost.  To form a monad, the data $\mathcal{P}, \eta, \mu$ must be compatible in such a way that these diagrams commute:

It’s a simple exercise to check that this is indeed the case.  You can extract the general definition of monad from what I’ve written here: you need an endofunctor on some category, and two natural transformations such that the diagrams analogous to those above commute.

Now let’s come back to fractals:

Many kinds of fractals are created by taking a simple shape, making multiple copies, transforming them somehow—for example, by shrinking or rotating—and then repeating.

For example, here’s one way to build a Sierpinski triangle, or rather its complement, staring from a single triangle and a rule for how triangles should “reproduce”:

If we look closely, we see that this process naturally involves ingredients from the power set monad.

In particular, the basic building block is the self-replication rule that we can write as a function

$f\colon T \to \mathcal{P}T$

where $T$ is the set of triangles in the plane.  We won’t need an explicit formula for the function, since a picture suffices to explain what it does:

We can get all of the triangular holes in the Sierpinski triangle by “iterating” f. But, what does this mean exactly?  It would be nice if we could just do $\text{}f(f(t))\text{''}$, but, strictly speaking, this doesn’t parse: we can’t feed a set of triangles into $f$, since this is a function that expects a single triangle as input.

Fortunately, we know the solution to this problem: since $f(t) \in \mathcal{P}T$, we can’t feed it into $f$, but we can feed it into $\mathcal{P}f$.  Good so far.

The problem with this is that $\mathcal{P}f(f(t))$ again doesn’t live in $T$, nor does it live in $\mathcal{P}T$—it lives in $\mathcal{P}^2T$:

$\mathcal{P}f \colon \mathcal{P}T \to \mathcal{P}^2 T$

It seems we would be doomed to march into ever more deeply-nested sets, were it not for the natural transformation $\mu$.  Fortunately, using $\mu_T$ we get

$\mu_T \circ \mathcal{P}f \colon \mathcal{P}T \to \mathcal{P} T$

So, now we can see the process of building a fractal as just repeatedly applying this function $\mu_T \circ \mathcal{P}f$.

One last problem: What do we apply it to?  We want to apply it to our initial triangle $t$, but that’s an element of $T$, not $\mathcal{P}T$.  Good thing we can think of this as an element of $\mathcal{P}T$ using the natural transformation $\eta$!

So, here’s a recipe to build a wide variety of fractals, like some of the ones I’ve posted on this blog:

but also tons of others, using the power set monad:

1. Start with a replication rule $f\colon X \to \mathcal{P}X$ on some set $X$ of “shapes.”
2. Pick a shape $x\in X$.
3. Feed your initial shape into $\eta_X$ to get the initial state.
4. Iterate feeding the current state into $\mu_X \circ \mathcal{P} T$ to get a new state, as many times as you want.

In fact, in calling this a “recipe,” I mean more than just an abstract mathematical recipe: this is essentially how I really drew all of those fractal pictures.  I generated them using Haskell—a functional programming language that knows about monads.  I didn’t use the power set monad, but rather Haskell’s built in “list monad,” which works very similarly.

## Dark Energy Survey complete tomorrow

8 January 2019
Tomorrow, Dark Energy Survey finishes six years of taking data!

This experiment is intended to help us better understand the nature of Dark Energy, which drives the accelerated expansion of the universe.

Shown here is the Dark Energy Camera, a 520-megapixel digital camera the collaboration used to collect 50TB of data on over 300 million distant galaxies during the last 6 years. It will be exciting to see what findings come from analyzing all those data!

http://news.fnal.gov/2019/01/dark-energy-survey-completes-six-year-mission/

## Savitzky-Golay filters

5 January 2019

There are often two different but equally important descriptions of a given mathematical process — one that makes it simple to calculate and another that makes it simple to understand.

Savitzky-Golay filters are like that. In digital signal processing, they give nice ways to “smooth” a signal, getting rid of small noisy fluctuations while retaining the signal’s overall shape. There are two ways to think about how they work—one based on convolution and another based on approximation by polynomials. It’s not obvious at first why these give the same result, but working out the correspondence is a fun and enlightening exercise.

Graphic by Cdang: Smoothing of noisy data by the Savitzky-Golay method (3rd degree polynomial, 9 points wide sliding window).

From the name Savitzky-Golay filter, you might well expect one way to think about Savitzky-Golay smoothing is as a convolutional filter.  Indeed, one general method of smoothing a function is to “smear” it with another function, the “filter,” effectively averaging the function’s value at each point with neighboring values.  In this case, the filter function is given by “Savitzky-Golay coefficients.”  You could look these up, or get them from your math software, but they may look rather mysterious until you work out where they came from.  Why these particular coefficients?

The other way to think of Savitzky-Golay smoothing goes as follows.  Suppose you have a digital signal, which for present purposes is just a continuous stream of real numbers:

$\ldots, f(-1), f(0), f(1), f(2), f(3), \ldots$

Now for each $i$, fit a polynomial to $f$ on the points between $i-k$ and $i+k$,  and replace $f(i)$ by the value $\hat f(i)$ of this polynomial at $i$.  This approach involves constructing a new polynomial fit at each point: always a polynomial of same degree, but fit to a different set of points.

Again, these two descriptions are equivalent.  The polynomial fit version is a lot more satisfying, in that I can visualize how the smoothing is working, and understand intuitively why this is a good smoothing method.  On the other hand, you wouldn’t want to calculate it using this method, solving a least-squares polynomial fit problem at each step. For calculations, convolution is much faster.

Why are these equivalent?

The key point that makes doing separate least-squares problems at each point equivalent to doing a simple convolution is that we are dealing with digital signals, with values arriving at regularly spaced intervals.  While the dependent variable of the signal could have any crazy behavior you might imagine, the regularity of the independent variable makes most of the least squares calculation the same at each step, and this is what makes the whole process equivalent to a convolution.

Here’s the correspondence in detail.

First, consider how we construct $\hat f(0)$.  This is based on a least squares fit to the points

$(-k, f(-k)), \ldots, (k, f(k))$

Here’s a quick review of how this works.  The idea is to find coefficients $a_0, \ldots a_d$ such that this polynomial of degree d:

$\displaystyle p(x) = \sum_{i=0}^{d} a_i x^i$

gives the best possible predictions for the values of $f(x)$ in that range.  If $d \ge 2k$, then there will be a polynomial whose graph passes perfectly through all $2k+1$ points, but in order to have a smoothing effect, we want $d < 2k$.  The problem is then to minimize the total squared error:

$\displaystyle \sum_{x=-k}^{k} (p(x) - f(x))^2$

We can write this compactly using matrices. First, form the design matrix $X$, a $(2k+1)\times (d+1)$ matrix with rows given by

$[ 1, i, i^2, \ldots i^d]$ for $i \in \{-k, -k+1, \ldots k\}$

For example, if k = 4, and d = 2, $X$ will have these entries:

     1   -4   16  -64
1   -3    9  -27
1   -2    4   -8
1   -1    1   -1
1    0    0    0
1    1    1    1
1    2    4    8
1    3    9   27
1    4   16   64


Notice that the matrix product $X\mathbf{a}$, where $\mathbf{a} = [a_0, \ldots a_d]^T$, then gives a column vector of values for the polynomial $p$:

$\displaystyle X\mathbf{a} = \begin{bmatrix} p(-k) \\ \vdots \\ p(k) \end{bmatrix}$.

Those are the “predicted” values for $f$, so if we let $\mathbf{y} = [f(-k), \ldots f(k)]^T$ be the column vector of actual values, then our problem is to find a vector $\mathbf{a}$ that minimizes

$\displaystyle \|X\mathbf{a} - \mathbf{y}\|^2$.

$\nabla \displaystyle \|X\mathbf{a} - \mathbf{y}\|^2 = 2X^T(X\mathbf{a} - \mathbf{y})$

is zero if and only if the normal equation

$X^T X\mathbf{a} = X^T\mathbf{y}$

holds.  When $X^T X$ is invertible, which is always the case if $d<2k$, this has solution

$\mathbf{a} = (X^T X)^{-1} X^T \mathbf{y}$

This gives the coefficients $a_i$.  Now, what about the value of $\hat f(0)$?

Our smoothed version of $f$, agrees at 0 with our polynomial approximation $p$.  Since we are evaluating the polynomial at zero, we get just the degree-zero term.  In other words, $\hat f(0)$ is just the topmost entry in the above solution.  That is,

$\hat f(0) = ((\mathbf{e}_0)^T(X^T X)^{-1} X^T) \mathbf{y}$

where $\mathbf{e}_0 = [1, 0, \ldots 0]$ is the first standard basis vector of $\mathbb{R}^{2k+1}$.

We’re almost done!  Notice that the previous equation just says the value of the smoothed function $\hat f(x)$ at $x= 0$ is a certain linear combination of the values of $f$ in a window of length $2k+1$ around 0.

More importantly, if we were to go through the same process to construct $f(x)$ for other values of $x$, it is clear that, by a simple shift of coordinates, we would get a linear combination with the same coefficients as above.

And that’s convolution!  The general formula for the smoothing is thus

$\hat f = ((\mathbf{e}_0)^T(X^T X)^{-1} X^T) \ast f$.

Returning to the example I gave above, with $k=4, d=2$, calculating $(X^T X)^{-1} X^T$ gives approximately

-0.09091   0.06061   0.16883   0.23377   0.25541   0.23377   0.16883   0.06061  -0.09091
0.07239  -0.11953  -0.16246  -0.10606   0.00000   0.10606   0.16246   0.11953  -0.07239
0.03030   0.00758  -0.00866  -0.01840  -0.02165  -0.01840  -0.00866   0.00758   0.03030
-0.01178   0.00589   0.01094   0.00758   0.00000  -0.00758  -0.01094  -0.00589   0.01178


so the Savitzky-Golay filter in this case is just the first row:

-0.09091   0.06061   0.16883   0.23377   0.25541   0.23377   0.16883   0.06061  -0.09091


Convolving any signal with this filter has the effect of replacing the value at each point with the value of the best quadratic polynomial fit to the values at that point and its eight nearest neighbors.

## Quarks and Antiquarks in the Nucleon

5 January 2019

My earliest research in physics was in a particle physics collaboration at FermiLab.  The idea of the experiment I worked on from 1994 to 1997 (FermiLab E866) was to study the nucleon sea—a tiny but turbulent region bubbling with quantum activity inside a proton or neutron, where quark-antiquark pairs can appear for an instant before annihilating each other.

In particular, we made a precise measurement of how often pairs of “up” quarks are produced, relative to how often “down” quarks are produced in the nucleon sea, and showed that there was a statistically significant difference which particle physics theory could not account for.

While I went on to more mathematical and theoretical work, I have colleagues who continued research along these lines, and it’s still nice to look in and see what’s going on in the area I started out in.

Fortunately for me, there’s a nice new review article on the subject of quark-antiquark pairs in the nucleon, written by two of my senior colleagues from E866, Don Geesaman and Paul Reimer:

D. F. Geesaman and P. E. Reimer, The Sea of Quarks and Antiquarks in the Nucleon: a Review, https://arxiv.org/abs/1812.10372

It’s nicely written, and explains the state of the art in sea quark physics from both the experimental and theoretical sides.

## Learning Artificial Intelligence

14 September 2018

One reason to love being a mathematician is that the mathematical universe is so staggeringly enormous that there’s never a shortage of fun new directions to go in the mathematical sciences.

Recently, I’ve been learning AI, and really enjoying digging into both the mathematical aspects of intelligence and the concrete programming skills for building state-of-the-art systems.  One big help for this has been a two-term Udacity “Nanodegree” that I just completed.

## Artificial Intelligence Nanodegree and Specializations

The whole program, which took from January through August 2018, consisted of three parts:

1. Introduction to Artificial Intelligence. This covered a bunch of topics in “classical” AI, largely following Russell and Norvig’s text, Artificial Intelligence: A Modern Approach.
2. Deep Learning and Applications.  This covered the hottest topic in AI these days—using deep neural networks to do machine learning—and a bunch of applications.
3. Specializations:
• Computer Vision.
• Natural Language Processing.
• Voice-User Interfaces.

These courses covered a lot of stuff!  For me, it was a good way to boost my skills quickly and help me catch up on developments and literature in AI.  Some people who are also interested in AI have been asking me about this program, so here’s more about what I did in it, and some evaluation of the Udacity courses.

## Setting up a gaming laptop for GPU-accelerated Deep Learning

8 September 2018

I’ve never had much interest in writing about tools I’m using, since I prefer thinking about what I’m doing with the tools, rather than the tools themselves.  And tools are always changing.  Recently, though, I needed to build a system for training deep neural networks, and I had one big question that took me a while to answer:

Can I convert a gaming laptop into a deep learning machine?

The answer is Yes, and I’m writing this post in case it’s helpful to someone who wants to do something similar. Also, I’m writing it so I can remember what I did.

If you want to train neural networks, GPUs are an order of magnitude faster than CPUs. I needed a GPU system for my current work, but I also like being mobile, running computations on the go, and without having to rely on pay-by-the-hour cloud services like AWS. I needed a laptop with a decent GPU.

But I also had some concerns.  If I’d had the money, I would have gotten a System 76 machine, or some other laptop designed for machine learning, but I had a budget of around \$1000.  I also suspected that GPUs in many standard laptops are probably severely underclocked for heat reasons.  A gaming laptop seemed like a potentially good solution, since gamers need performance.

Unfortunately, I had a hard time finding reliable-sounding information online about whether most gaming laptops play nicely with Linux.  The ones I looked at were mostly not on Ubuntu’s certified hardware list, and I had already been burned once:  I’d had one recent bad experience with some hardware that crashed frequently with my Ubuntu setup, and this left me slightly nervous about trying to convert a gaming laptop to a deep learning machine.

Eventually, I came up with a solution I’m very happy with.  My Dell G7 15 gaming laptop works great for what I needed:

• NVIDIA GeForce GTX 1060 Mobile GPU
• 12 CPUs (Intel Core i7-8750H CPU @ 2.20GHz)
• Works well with Ubuntu 18
• Thunderbolt 3 port, so I can presumably use an external GPU instead.  (Though I haven’t tried this so far.)
• It doesn’t look nearly as ridiculous as most gaming machines: No wings, spoiler, or dragon scales.

I’ve now been using this machine for a couple of months, and it’s been great for my purposes!  I’m happily and quickly training neural networks, even in places where I’ve got no connection to the cloud. True, it’s a bit bulky compared to most laptops, and the power usage means the battery only has typically a couple of hours of charge, but those are really the only downsides.

Below, you can see the steps I used. I wrote these mostly so I can refer back to them easily, but I’d be happy if they’re helpful to anyone else who wants to do the same…

## Cutting a triangle into infinitely many Koch snowflakes

8 June 2018

I want to explain the construction behind a piece that I posted recently without explanation.  Starting from a single triangle, I’ll show you how to cut it up into pieces, rearrange those pieces, and get this:

## The construction

Mark off each edge into thirds, and connect those points with lines using this pattern:

This divides up the triangle into into seven smaller triangles, three equilateral and three isosceles.  Remove the three isosceles triangles to get this:

But now, notice that the three triangles you removed can each be cut in half and then taped back together along an edge, so that you get three equilateral triangles. Do that, and place the new equilateral triangles so that they stick out from the sides of the original triangle, and you will get this:

That’s it for step 1!

Now, for step 2, repeat this whole process for each of the 7 equilateral triangles obtained in step 1.  Step 3: Do the same for each of the 49 triangles obtained in step 2.  And so on. My original picture, at the top of this post, is what you get after step 5.

Notice that each step is area-preserving, so in particular, the total area of all of the black triangles in my original picture is the same as the are of the triangle I started with.

Here’s an animation showing the first five steps in the sequence, and then those same steps backwards, going back to the original triangle:

The reason the picture seems to get darker in the latter steps is that the triangles are drawn with black edges, and eventually there are a lot of edges.  Since there’s a limit to how thin the edges can be drawn, eventually, the picture is practically all edges.

## How many snowflakes do you see?

The outline of the entire picture is clearly a Koch curve, so we have generated a Koch curve from a triangle.  But, what I really love about this construction is that every triangle that occurs at any step in the recursive process also spawns a Koch curve!  That’s a lot of Koch curves.

To make this precise, we can assume that triangles at each step are closed subsets of the plane.  Admittedly, the “cutting” analogy falls apart slightly here, since two pieces resulting from a “cut” each contain a copy of the edge the cut was made along, but that’s OK.   With this closure assumption, each of the Koch curves, one for each triangle formed at any stage in the process, is a subset of the intersection over all steps.

## The Koch Snowflake from triangles

11 May 2018

Here’s a new variation in my series of my Koch snowflake-like fractals drawn entirely from regular polygons.  This picture consists entirely of black equilateral triangles.

I’ll avoid explaining this one for now, except to say that I generated it starting from a single triangle, and iteratively replacing each triangle by seven new triangles.  This is the sixth generation. The construction differs only slightly from my previous Koch snowflake fractal, in which each triangle had six descendants.  I really like this new version, because you can see Koch snowflakes showing up in even more (infinitely more!) places than before.

There are also analogs of this for squares and pentagons!