Archive for the ‘Mathematics’ Category

Fractals and Monads – Part 2

30 January 2019

Last time I discussed the power set monad and how it shows up in certain methods of building fractals.  This time I’ll go a bit further, and also explain some Haskell code I actually use to generate fractal pictures like this:


To do this, I’ll introduce Kleisli categories, the list monad, and a bit about how monads work in Haskell!

Recall we were considering fractals that can be built using a “replication” function

f\colon X \to \mathcal{P}X

on some set of “shapes” X.  As an example, we had considered a map that takes any triangle in the plane to sets of three new triangles as shown here:


and we noted that “iterating” this map indefinitely gives us all of the triangular “holes” in a Sierpinski triangle:


I also described how iterating this map, if we do everything carefully, involves the structure of the power set monad.

In fact, considering maps like f leads us directly to the “Kleisli category” associated to the power set monad.  I want to explain this here, because in computer science people actually tend to think of monads essentially in terms of their associated Kleisi categories, and that’s how monads are implemented in Haskell.

Every monad has an associated Kleisli category, but since I’ve been working with the power set monad (\mathcal{P}, \eta, \mu) in the category of sets, I’ll start with that special case.

First, in the Kleisli category, which we’ll denote \mathrm{Set}_\mathcal{P}, the objects are the same as the original category \mathrm{Set}: they’re just sets.  A morphism from the set X to the set Y, however, is a function

f\colon X \to \mathcal{P}Y

For now, at least, I’ll avoid making up any special notation for morphisms in the Kleisli category, and instead freely regard f\colon X \to \mathcal{P}Y either as a morphism from X to \mathcal{P}Y in the category \mathrm{Set} of sets, or as a morphism from X to Y in the Kleisli category \mathrm{Set}_\mathcal{P}; these are really the same thing.  If this is confusing, whenever you’re thinking of f as a morphism in the Kleisli category, just think of the \text{``}\mathcal{P}\text{''} as decoration on the arrow , rather than on the codomain object Y.

Given two functions f\colon X \to \mathcal{P} Y and g\colon Y \to \mathcal{P} Z, the Kleisli composite g\bullet f is defined by

g\bullet f := \mu_Z\circ \mathcal{P} g\circ f

Checking the domain and codomain of each of the functions on the right, we can see that this indeed gives a morphism from X to Z in the Kleisli category:

\qquad \qquad X \stackrel f \longrightarrow \mathcal{P} Y \stackrel {\mathcal{P} g} \longrightarrow \mathcal{P} \mathcal{P} Z \stackrel {\mu_Z} \longrightarrow \mathcal{P} Z

You can check that \bullet is associative, and that the maps \eta_X\colon X \to \mathcal{P} X are identity morphisms, so we really do get a category. That’s all there is to the Kleisli category.

Returning to the sort of function we had been considering for generating fractals:

f\colon X \to \mathcal{P}X

these are clearly just endomorphisms in the Kleisli category.  This also makes it clear what we mean by “iterating” such a function: it’s an endomorphism, so we can just compose it with itself as many times as we want.  In particular, analogously to how we define “powers” of ordinary functions by repeated composition, we can define the nth Kleisli power of f recursively by

\displaystyle f^{{\bullet}n} = \left\{\begin{array}{cc}\eta_X & n=0\\ f \bullet f^{{\bullet}(n-1)}& n\ge 1 \end{array}\right.

That is, the 0th power of f is the identity, while the nth power of is the Kleisli composite of f with its (n-1)st power.

For example, consider the set T of all equilateral triangles in the plane and consider the map f\colon T \to \mathcal{P}T defined so that f(t) has six elements, each 1/3 the size of, and linked together around the boundary of the original triangle t, as shown here:


Then, starting with an arbitrary triangle t \in T the images under successive Kleisli powers of the generating endomorphism:

f^{\bullet 0}(t), f^{\bullet 1}(t), f^{\bullet 2}(t), f^{\bullet 3}(t), f^{\bullet 4}(t), \ldots

look like this:


Some Haskell

Let’s do some Haskell—If you don’t know Haskell, don’t worry; I’ll explain what everything means. If you do know Haskell, this will be super easy, perhaps even obvious, though I may write things in ways you won’t expect, since I’m transitioning over from the categorical definition of monad, rather than diving in with how monads are actually defined in Haskell.

The code we really need is almost embarrassingly short, mainly because monads are extremely well suited to the task at hand.  What we need most is just to be able to compute Kleisli powers!

Here’s how the formula for the Kleisli power of an endomorphism becomes a function in Haskell:

import Control.Monad

kpow :: Monad m => Int -> (a -> m a) -> a -> m a
kpow 0 _ = return
kpow n f = f <=< kpow (n-1) f

Let’s go through this line by line.

The first line just loads some tools for working with monads; there’s not much I need to say about that now.

The next line is the first one of real content; it’s the “type signature” of the function “kpow”.  The first bit, Monad m, just says that m is any monad; Haskell knows about several monads, and you can also define your own.   Then, the rest of that line:

Int -> (a -> m a) -> a -> m a

can be interpreted as declaring that we have a function that takes one integer (Int), the “power,” and one element of \hom(a, ma), (i.e. something of type a -> m a), and gives us another element of \hom(a, ma).  So if I were to write this line in standard mathematical notation, I would write:

\mathrm{kpow}:\mathbb{N} \times \hom(a, ma) \to \hom(a, ma)

where m is any monad, and a is any type.  We take an integer and a Kleisli endomorphism for the monad m, and return another endomorphism. Notice there’s a slight discrepancy here, in that an Int is any integer, whereas I used the natural numbers \mathbb{N}.  But the integer really is intended to be non-negative.  If you fed a negative integer into kpow, the recursion would never terminate; so, my code doesn’t forbid it, but you shouldn’t do it.

The last two lines of the code give the actual formula for the Kleisli power. If you just look at my earlier formula:

\displaystyle f^{{\bullet}n} = \left\{\begin{array}{cc}\eta_X & n=0\\ f \bullet f^{{\bullet}(n-1)}& n\ge 1 \end{array}\right.

and compare this to these two lines:

kpow 0 _ = return
kpow n f = f <=< kpow (n-1) f

you can probably figure out all of the Haskell syntax just by matching up symbols.

The first line says “whenever the first argument of kpow is 0, the result is “return.” This may be confusing if you’re used to how the keyword “return” is used in many other programming languages. Sorry; it’s not my fault! In Haskell “return” means “\eta_x for any object x“, where the objects in this case are always data types.

The second line says “the nth Kleisli power of f is the composite of f with the (n-1)st power of f”, just as in the other formula. You probably guessed that the Kleisli composite f \bullet g is written f <=< g in Haskell.

Here’s a dictionary relating the notation I’ve been using to Haskell notation for the same thing:


I didn’t need to use \mu, a.k.a. join explicitly in the definition of Kleisli powers, since it’s already wrapped into the definition of Kleisli composition.

List monad

To do some examples I’ll use the list monad, which is a built-in monad in Haskell. This is very much like the power set monad, since “lists” are like sets in many ways, so I can introduce this pretty quickly. We really just need a very basic idea of how the list monad works.

A “list” of elements of “type” a can be written written simply as

[x_1, x_2, \ldots x_n] \quad \text{where } x_i\in a

for a finite list, or [x_1, x_2, \ldots] for an infinite list.

Making lists of elements of a given type clearly gives a functor analogous to the power set functor: for any function f\colon a \to b we get a function that just acts element-wise on lists:

([x_1,x_2,\ldots, x_n]) \mapsto [f(x_1),f(x_2), \ldots f(x_n)]

mapping a list of elements of type a into a list of elements of type b.

Based on what we’ve done with sets, you can probably guess how the two natural transformations in the list monad work, too.

First, any single item can be regarded as a list with only one item on it, so

\eta_a(x) = [x]

In Haskell, for example, you could take an element of type Int, say the integer 5, and turn it into a list of integers by using return 5. As long as Haskell has a context to know it is working in the list monad rather than some other monad, this will output [5]. Voilà!

Second, the natural transformation \mu turns lists of lists into lists.  If you’ve got a list of to-do lists, you can always turn it into a single list by first writing down the items on the first list, in order, then continuing with the items on the second list, and so on.

In Haskell, If you enter join [[1,2,3],[4,5]], you’ll get [1,2,3,4,5] as output. That’s \mu.

There’s one noteworthy subtlety, which is that infinite lists are also allowed. So, if you have a list of lists:

[Fun stuff to do, Work to do]

and the first item happens to be an infinite list, then if you try joining these lists up into one in the way I described, you’ll never get to any items on the second list. This is one way \mu in the list monad differs from \mu in the power set monad: In the former, as soon as one of the lists is infinite, everything after that gets thrown away. This last part is not obvious; you certainly could devise ways to join a list of infinite lists together and include all of the items, but that’s not how things work in this monad.

You can easily work out how Kleisli composition works from this, so we’ve now got everything we need.

Cantor set in the List Monad.

Let’s do an example: the Cantor set. I’ll represent closed intervals on the real line by ordered pairs in Haskell. So, when I write (0.3, 1.5), I mean the set \{x\in \mathbb{R} : 0.3 \le x \le 1.5\}. I know, round brackets for closed intervals is weirdly non-standard, but in Haskell, square brackets are reserved for lists, and it’s better to keep things straight.

To generate the Cantor set, all we need is the Kleisli endomorphism in the list monad that describes the general step. This can be defined by a one-line Haskell function:

f (p,q) = [(p, p + (q-p)/3), (q-(q-p)/3, q)]

If we apply this to the unit interval (0, 1), here’s what we get:


So this works: it’s mapping the closed interval [0,1] to a list of two closed intervals: [0, 1/3] and [2/3, 1].

Now we can compute any step in the construction of the Cantor set by computing Kleisli powers. Let’s just do one. Once you’ve loaded the function kpow, enter this in Haskell:

kpow 3 f (0,1)

and—kapow!—you’ll get this output:


Ugly isn’t it? But convert to fractions and you’ll see this is doing just the right thing.


I’ll do more examples in the next post.

Fractals and Monads — Part 1

18 January 2019

A while back, I posted a bunch of fractal artwork that I did for fun, based on variations of the Koch snowflake; like this one, for example:


This is composed entirely of equilateral triangles, but it’s just one step in an iterated process leading to a fractal consisting of infinitely many Koch snowflakes at ever smaller scales.

I had a lot of fun coming up with designs like this, but actually I didn’t start out wanting to do new things with the Koch snowflake—that was just a fun tangent.  Originally, I was thinking about monads!

It’s high time I got back to what I was originally working on and explained what these fractals have to do with the “power set monad.”  To do that, I should probably start by explaining what the power set monad is.

For any set X the power set \mathcal{P} X is the set of all its subsets.  If you’ve ever studied any set theory, this should be familiar.  Unfortunately, a typical first course in set theory may not go into the how power sets interact with functions; this is what the “monad” structure of power sets is all about. This structure has three parts, each corresponding to a very common and important mathematical construction, but also so simple enough that we often use it without even thinking about it.

Let me finally say what the monad structure of power sets actually is.

First, mapping sets to their power sets is actually a functor:  For any function f\colon X \to Y, there corresponds a function

\mathcal{P}f\colon \mathcal{P}X \to \mathcal{P}Y

which sends each element S \in \mathcal{P}X, i.e. each subset S \subseteq X, to its image:

(\mathcal{P}f)(S) = \{ f(s) : s \in S\}.

This makes \mathcal{P} a functor because the mapping of functions gets along nicely with identity functions and composition.   This functor is the first part of the power set monad.

You might be accustomed to writing the image of the set S under f simply as \text{``}f(S)\text{''}, rather than \mathcal{P}f(S).  But here, we need to be more careful: despite its frequent use in mathematics, \text{``}f(S)\text{''} is a notational abuse for which \mathcal{P}f(S) is the proper form.

To motivate the next piece in the power set monad, notice that there’s a strong sense in which \mathcal{P}f is just an extension of f, namely, on singleton sets, it essentially “is” f:

\mathcal{P}f(\{ x \}) = \{f(x)\}.

This might make it tempting write \mathcal{P}f( x ) =  f(x) for x \in X, as sort of “dual” version of the previous abuse of notation, though I don’t think anyone does that.

A better way to express how \mathcal{P}f extends f is to introduce the canonical map sending elements of X to the corresponding singleton sets:

\displaystyle \begin{array}{rcl} \eta_X \colon X & \to & \mathcal{P}X \\ x & \mapsto & \{x\} \end{array}

Then, the previous equation can be written \mathcal{P}f \circ \eta_X = \eta_Y \circ f, which is precisely the equation that makes \eta a natural transformation. This natural transformation is the second piece in the monad structure.

Anyone who’s taught elementary set theory can attest that some students have a hard time at first making with conceptual distinction between elements and singleton subsets of a set.  Indeed, conflating the two is a very “natural” thing to do, which is why this is a natural transformation.

The last piece of the power set monad is another natural transformation. Much like \eta, which formalizes a way to think of elements as being subsets, there is a natural transformation that lets us think of sets of subsets as being subsets.  This is given by, for each set X, a map:

\displaystyle \begin{array}{rcl} \mu_X \colon \mathcal{P}^2X & \to & \mathcal{P}X \\ S & \mapsto & \bigcup_{U\in S} U \end{array}

Here \mathcal{P}^2X is the power set of the power set of X, so an element of it is a sets of subsets of X, and we turn this into a single subset by taking the union.

That’s it!  Or, almost.  To form a monad, the data \mathcal{P}, \eta, \mu must be compatible in such a way that these diagrams commute:


It’s a simple exercise to check that this is indeed the case.  You can extract the general definition of monad from what I’ve written here: you need an endofunctor on some category, and two natural transformations such that the diagrams analogous to those above commute.

Now let’s come back to fractals:


Many kinds of fractals are created by taking a simple shape, making multiple copies, transforming them somehow—for example, by shrinking or rotating—and then repeating.

For example, here’s one way to build a Sierpinski triangle, or rather its complement, staring from a single triangle and a rule for how triangles should “reproduce”:


If we look closely, we see that this process naturally involves ingredients from the power set monad.

In particular, the basic building block is the self-replication rule that we can write as a function

f\colon T \to \mathcal{P}T

where T is the set of triangles in the plane.  We won’t need an explicit formula for the function, since a picture suffices to explain what it does:


We can get all of the triangular holes in the Sierpinski triangle by “iterating” f. But, what does this mean exactly?  It would be nice if we could just do \text{``}f(f(t))\text{''}, but, strictly speaking, this doesn’t parse: we can’t feed a set of triangles into f, since this is a function that expects a single triangle as input.

Fortunately, we know the solution to this problem: since f(t) \in \mathcal{P}T, we can’t feed it into f, but we can feed it into \mathcal{P}f.  Good so far.

The problem with this is that \mathcal{P}f(f(t)) again doesn’t live in T, nor does it live in \mathcal{P}T—it lives in \mathcal{P}^2T:

\mathcal{P}f \colon \mathcal{P}T \to \mathcal{P}^2 T

It seems we would be doomed to march into ever more deeply-nested sets, were it not for the natural transformation \mu.  Fortunately, using \mu_T we get

\mu_T \circ \mathcal{P}f \colon \mathcal{P}T \to \mathcal{P} T

So, now we can see the process of building a fractal as just repeatedly applying this function \mu_T \circ \mathcal{P}f.

One last problem: What do we apply it to?  We want to apply it to our initial triangle t, but that’s an element of T, not \mathcal{P}T.  Good thing we can think of this as an element of \mathcal{P}T using the natural transformation \eta!

So, here’s a recipe to build a wide variety of fractals, like some of the ones I’ve posted on this blog:

but also tons of others, using the power set monad:

  1. Start with a replication rule f\colon X \to \mathcal{P}X on some set X of “shapes.”
  2. Pick a shape x\in X.
  3. Feed your initial shape into \eta_X to get the initial state.
  4. Iterate feeding the current state into \mu_X \circ \mathcal{P} T to get a new state, as many times as you want.

In fact, in calling this a “recipe,” I mean more than just an abstract mathematical recipe: this is essentially how I really drew all of those fractal pictures.  I generated them using Haskell—a functional programming language that knows about monads.  I didn’t use the power set monad, but rather Haskell’s built in “list monad,” which works very similarly.

Savitzky-Golay filters

5 January 2019

There are often two different but equally important descriptions of a given mathematical process — one that makes it simple to calculate and another that makes it simple to understand.

Savitzky-Golay filters are like that. In digital signal processing, they give nice ways to “smooth” a signal, getting rid of small noisy fluctuations while retaining the signal’s overall shape. There are two ways to think about how they work—one based on convolution and another based on approximation by polynomials. It’s not obvious at first why these give the same result, but working out the correspondence is a fun and enlightening exercise.


Graphic by Cdang: Smoothing of noisy data by the Savitzky-Golay method (3rd degree polynomial, 9 points wide sliding window).

From the name Savitzky-Golay filter, you might well expect one way to think about Savitzky-Golay smoothing is as a convolutional filter.  Indeed, one general method of smoothing a function is to “smear” it with another function, the “filter,” effectively averaging the function’s value at each point with neighboring values.  In this case, the filter function is given by “Savitzky-Golay coefficients.”  You could look these up, or get them from your math software, but they may look rather mysterious until you work out where they came from.  Why these particular coefficients?

The other way to think of Savitzky-Golay smoothing goes as follows.  Suppose you have a digital signal, which for present purposes is just a continuous stream of real numbers:

\ldots, f(-1), f(0), f(1), f(2), f(3), \ldots

Now for each i, fit a polynomial to f on the points between i-k and i+k,  and replace f(i) by the value \hat f(i) of this polynomial at i.  This approach involves constructing a new polynomial fit at each point: always a polynomial of same degree, but fit to a different set of points.

Again, these two descriptions are equivalent.  The polynomial fit version is a lot more satisfying, in that I can visualize how the smoothing is working, and understand intuitively why this is a good smoothing method.  On the other hand, you wouldn’t want to calculate it using this method, solving a least-squares polynomial fit problem at each step. For calculations, convolution is much faster.

Why are these equivalent?

The key point that makes doing separate least-squares problems at each point equivalent to doing a simple convolution is that we are dealing with digital signals, with values arriving at regularly spaced intervals.  While the dependent variable of the signal could have any crazy behavior you might imagine, the regularity of the independent variable makes most of the least squares calculation the same at each step, and this is what makes the whole process equivalent to a convolution.

Here’s the correspondence in detail.

First, consider how we construct \hat f(0).  This is based on a least squares fit to the points

(-k, f(-k)),  \ldots,  (k, f(k))

Here’s a quick review of how this works.  The idea is to find coefficients a_0, \ldots a_d such that this polynomial of degree d:

\displaystyle p(x) = \sum_{i=0}^{d} a_i x^i

gives the best possible predictions for the values of $f(x)$ in that range.  If d \ge 2k, then there will be a polynomial whose graph passes perfectly through all 2k+1 points, but in order to have a smoothing effect, we want d < 2k.  The problem is then to minimize the total squared error:

\displaystyle \sum_{x=-k}^{k} (p(x) - f(x))^2

We can write this compactly using matrices. First, form the design matrix X, a (2k+1)\times (d+1) matrix with rows given by

[ 1, i, i^2, \ldots i^d] for i \in \{-k, -k+1, \ldots k\}

For example, if k = 4, and d = 2, X will have these entries:

     1   -4   16  -64
     1   -3    9  -27
     1   -2    4   -8
     1   -1    1   -1
     1    0    0    0
     1    1    1    1
     1    2    4    8
     1    3    9   27
     1    4   16   64

Notice that the matrix product X\mathbf{a}, where \mathbf{a} = [a_0, \ldots a_d]^T, then gives a column vector of values for the polynomial p:

\displaystyle X\mathbf{a} = \begin{bmatrix} p(-k) \\ \vdots \\ p(k) \end{bmatrix}.

Those are the “predicted” values for f, so if we let \mathbf{y} = [f(-k), \ldots f(k)]^T be the column vector of actual values, then our problem is to find a vector \mathbf{a} that minimizes

\displaystyle \|X\mathbf{a} - \mathbf{y}\|^2.

The gradient

\nabla \displaystyle \|X\mathbf{a} - \mathbf{y}\|^2 = 2X^T(X\mathbf{a} - \mathbf{y})

is zero if and only if the normal equation

X^T X\mathbf{a} = X^T\mathbf{y}

holds.  When X^T X is invertible, which is always the case if $d<2k$, this has solution

\mathbf{a} = (X^T X)^{-1} X^T \mathbf{y}

This gives the coefficients a_i.  Now, what about the value of \hat f(0)?

Our smoothed version of f, agrees at 0 with our polynomial approximation p.  Since we are evaluating the polynomial at zero, we get just the degree-zero term.  In other words, \hat f(0) is just the topmost entry in the above solution.  That is,

\hat f(0) = ((\mathbf{e}_0)^T(X^T X)^{-1} X^T) \mathbf{y}

where \mathbf{e}_0 = [1, 0, \ldots 0] is the first standard basis vector of \mathbb{R}^{2k+1}.

We’re almost done!  Notice that the previous equation just says the value of the smoothed function \hat f(x) at x= 0 is a certain linear combination of the values of f in a window of length 2k+1 around 0.

More importantly, if we were to go through the same process to construct f(x) for other values of x, it is clear that, by a simple shift of coordinates, we would get a linear combination with the same coefficients as above.

And that’s convolution!  The general formula for the smoothing is thus

\hat f = ((\mathbf{e}_0)^T(X^T X)^{-1} X^T) \ast f.

Returning to the example I gave above, with k=4, d=2, calculating (X^T X)^{-1} X^T gives approximately

-0.09091   0.06061   0.16883   0.23377   0.25541   0.23377   0.16883   0.06061  -0.09091
 0.07239  -0.11953  -0.16246  -0.10606   0.00000   0.10606   0.16246   0.11953  -0.07239
 0.03030   0.00758  -0.00866  -0.01840  -0.02165  -0.01840  -0.00866   0.00758   0.03030
-0.01178   0.00589   0.01094   0.00758   0.00000  -0.00758  -0.01094  -0.00589   0.01178

so the Savitzky-Golay filter in this case is just the first row:

-0.09091   0.06061   0.16883   0.23377   0.25541   0.23377   0.16883   0.06061  -0.09091

Convolving any signal with this filter has the effect of replacing the value at each point with the value of the best quadratic polynomial fit to the values at that point and its eight nearest neighbors.

Hopf algebra gauge theory and Kitaev models at Perimeter Institute

7 August 2017

I just got back from a trip to the Perimeter Institute, where I spoke at a conference on Hopf Algebras in Kitaev’s Quantum Double Models.

This workshop was a lot of fun!  I learned a lot, had the chance to talk to people I’ve known for a long time, and meet others I hadn’t managed to connect with before. I was especially excited to find out about some lines of work in progress that build on my work with Catherine Meusburger on Hopf algebra gauge theory.

In fact, our work on this seems to have been an impetus for the workshop, and it was really gratifying to see how other people are beginning to apply our theory, and also work out some interesting examples of it for particular Hopf algebras!  I’m anticipating some interesting work coming out in the near future.

Here’s the conference photo; I’m farthest right, and my coauthor, Catherine, is the 11th head from the left, peeking out from the second row:


I gave an introductory talk on the subject of Hopf algebra gauge theory, and you can download the slides from my talk, or even watch the video.  Catherine’s talk followed mine, and she showed how Kitaev models are related to Hopf algebra gauge theory in the same way that Turaev-Viro TQFTs are related to Reshetikhin-Turaev TQFTs.  Video of her talk is also available.  Of course, for more detail on Hopf algebra gauge theory, you can also check out our paper: Hopf algebra gauge theory on a ribbon graph.

I can also recommend watching other talks from the conference, available from the webpage linked to above.  This was just the kind of conference I like best, since it brought people from multiple research communities together, in this case including mathematicians and physicists of various sorts as well as mathematical computer scientists. Kitaev models have been a hot topic the past few years, and one reason I think they’re fun is precisely that people from several areas—quantum computation, Hopf algebras, category theory, quantum gravity, quantum foundations, topological quantum field theory, condensed matter physics, and more—are working together.  Of course, this probably also helps explain the rather long conference title.



George Hart’s “12-Card Star” and suit permutations.

7 March 2015

When I was at the JMM in San Antonio in January, I was happy to catch a workshop by George Hart:


I went to some great talks at the JMM, but a hands-on, interactive workshop was a nice change of pace in the schedule. Having seen some of George’s artwork before, I couldn’t resist. In the workshop, he taught us to build his sculpture/puzzle which he calls the 12 Card Star. Here’s what mine looked like:


He supplied the necessary materials: 13 decks of cards, all pre-cut (presumably with a band saw), like this:


We each took 13 card from these decks—the 13th, he said, was “just in case something terrible happens.”

He showed us how to put three cards together:


Then he gave us a clue for assembling the star: the symmetry group is the same as that of a …

Wait! Let’s not give that away just yet. Better, let’s have some fun figuring out the symmetry group.

Let’s start by counting how many symmetries there are. There are twelve cards in the star, all identically situated in relation to their neighbors, so that’s already 12 symmetries: given any card, I can move it to the position of my favorite card, which is the one I’ll mark here with a blue line along its fold:


But my favorite card also has one symmetry: I can rotate it 180^\circ, flipping that blue line from end to end around its midpoint, and this gives a symmetry of the whole star. (Actually, this symmetry is slightly spoiled since I drew the five of hearts: that heart right in the middle of the card isn’t symmetric under a 180^\circ rotation, but never mind that. This would be better if I had drawn a better card, say the two of hearts, or the five of diamonds.)

So there are 12\times 2 = 24 symmetries in total, and we’re looking for a group of order 24. Since 24 = 4!, the most obvious guess is the group of permutations of a 4-element set. Is it that? If so, then it would be nice to have a concrete isomorphism.

By a concrete isomorphism, I mean a specific 4-element set such that a permutation of that set corresponds uniquely to a symmetry of the 12-card star. Where do we get such a 4-element set? Well, since there are conveniently four card suits, let’s get a specific isomorphism between the symmetry group of Hart’s star and the group of permutations of the set


At the workshop, each participant got all identical cards, as you can see in the picture of mine. But if I could choose, I’d use a deck with three 2’s of each suit:


From this deck, there is an essentially unique way to build a 12-Card Star so that the isomorphism between the symmetry group and the group of permutations of suits becomes obvious! The proof is `constructive,’ in that to really convince yourself you might need to construct such a 12-card star. You can cut cards using the template on George’s website. He’s also got some instructions there. But here I’ll tell you some stuff about the case with the deck of twelve 2’s. % and from these it will be clear that (if you succeed) your star will have the desired properties.

First notice that there are places where four cards come together, like this:


In fact, there are six such places—call these the six 4-fold rotation points—and it’s no coincidence that six is also the number of cyclic orderings of card suits:


Now, out of the deck of twelve 2’s, I claim you can build a 12-card star so that each of these cyclic orderings appears at one 4-fold rotation point, and that you can do it in an essentially unique way.

This should be enough information to build such a 12-card star. If you do, then you can check the isomorphism. Think up an permutation of the set of suits, like this one:

suitpermand check that you can rotate your 12-card star in such a way that all of the suit symbols on all of the cards in the 12-card star are permuted in that way.  The rest follows by counting.

Sometime I should get hold of the right cards to actually build one like this.

Of course, there are other ways to figure out the symmetry group. What George Hart actually told us during the workshop was not that the symmetry group was the permutation group on 4 elements, but rather that the symmetry group was the same as that of the cube. One way to see this is by figuring out what the `convex hull’ of the 12-card star is. The convex hull of an object in Euclidean space is just the smallest convex shape that the object can fit in. Here it is:


This convex polyhedron has eight hexagonal faces and six square faces. You might recognize as a truncated octahedron, which is so named because you can get it by starting with an octahedron and cutting off its corners:


The truncated octahedron has the same symmetry group as the octahedron, which is the same as the symmetry group of the cube, since the cube and octahedron are dual.

Thanks to Chris Aguilar for the Vectorized Playing Cards, which I used in one of the pictures here.

From the Poincaré group to Heisenberg doubles

25 September 2014

There’s a nice geometric way to understand the Heisenberg double of a Hopf algebra, using what one might call its “defining representation(s).” In fact, it’s based on the nice geometric way to understand any semidirect product of groups, so I’ll start with that.

First, consider the Poincaré group, the group of symmetries of Minkowski spacetime.  Once we pick an origin of Minkowski spacetime, making it into a vector space \mathbb{R}^{3,1}, the Poincaré group becomes a semidirect product

\mathrm{ISO}(3,1)\cong\mathbb{R}^{3,1} \rtimes \mathrm{SO}(3,1)

and the action on \mathbb{R}^{3,1} can be written

(v,g)\cdot x = v + g x

In fact, demanding that this be a group action is enough to determine the multiplication in the Poincaré group.  So, this is one way to think about the meaning of the multiplication law in the semidirect product.

In fact, there’s nothing so special about Minkowski spacetime in this construction.  More generally, suppose I’ve got a vector space V and a group G of symmetries of V.   Then V acts on itself by translations, and we want to form a group that consists of these translations as well as elements of G.  It should act on V by this formula:

(v,g)\cdot x = v + g x

Just demanding that this give a group action is enough to determine the multiplication in this group, which we call V \rtimes G.   I won’t bother writing the formula down, but you can if you like.

In fact, there’s nothing so special about V being a vector space in this construction.  All I really need is an abelian group H with a group G of symmetries.  This gives us a group H\rtimes G, whose underlying set is H \times G, and whose multiplication is determined by demanding that

(h,g) \cdot h' = h + gx

is an action.

In fact, there’s nothing so special about H being abelian.  Suppose I’ve got a group H with a group G of symmetries.  This gives us a group H\rtimes G, built on the set H \times G, and with multiplication  determined by demanding that

(h,g)\cdot x = h (g x)

give an action on H.  Here gx denotes the action of g\in G on x\in H, and h(gx) is the product of h and gx.

For example, if H is a group and G=\mathrm{Aut}(H) is the group of all automorphisms of H, then the group H\rtimes \mathrm{Aut}(H) is called the holomorph of H.

What I’m doing here is defining H \rtimes G as a concrete group: it’s not just some abstract group as might be defined in an algebra textbook, but rather a specific group of transformations of something, in this case transformations of H.  And, if you like Klein Geometry, then whenever you see a concrete group, you start wondering what kind of geometric structure gets preserved by that group.

So: what’s the geometric meaning of the concrete group H \rtimes G?  This really involves thinking of H in two different ways: as a group and as a right torsor of itself.  The action of G preserves the group structure by assumption: it acts by group automorphisms.  On the other hand, the action of H by left multiplication is by automorphisms of H as a right H space.  Thus,  H \rtimes G preserves a kind of geometry on H that combines the group and torsor structures.  We can think of these as a generalization of the “rotations” and “translations” in the Poincaré group.

But I promised to talk about the Heisenberg double of a Hopf algebra.

In fact, there’s nothing so special about groups in the above construction.  Suppose H is a Hopf algebra, or even just an algebra, and there’s some other Hopf algebra G that acts on H as algebra automorphisms.  In Hopf algebraists’ lingo, we say H is a “G module algebra”.  In categorists’ lingo, we say H is an algebra in the category of G modules.

Besides the Hopf algebra action, H also acts on itself by left multiplication.  This doesn’t preserve the algebra structure, but it does preserve the coalgebra structure: H is an H module coalgebra.

So, just like in the group case, we can form the semidirect product, sometimes also called a “smash product” in the Hopf algebra setting, H \rtimes G, and again the multiplication law in this is completely determined by its action on H. We think of this as a “concrete quantum group” acting as two different kinds of “quantum symmetries” on H—a “point-fixing” one preserving the algebra structure and a “translational” one preserving the coalgebra structure.

The Heisenberg double is a particularly beautiful example of this.   Any Hopf algebra H is an H^* module algebra, where H^* is the Hopf algebra dual to H.  The action of H^* on H is the “left coregular action” \rightharpoonup defined as the dual of right multiplication:

(h\rightharpoonup \alpha)(k) = \alpha(kh)

for all h,k\in H and all \alpha \in H^*.

One could use different conventions for defining the Heisenberg double, of course, but not as many as you might think.  Here’s an amazing fact:

H \rtimes H^* = H \ltimes H^*

So, while I often see \rtimes and \ltimes confused, this is the one case where you don’t need to remember the difference.

But wait a minute—what’s that “equals” sign up there.   I can hear my category theorist friends snickering.  Surely, they say, I must mean H \rtimes H^* is isomorphic to H \rtimes H^*.

But no.  I mean equals.

I defined H \rtimes H^* as the algebra structure on H \otimes H^* determined by its action on H, its “defining representation.”   But every natural construction with Hopf algebras has a dual.  I could have instead defined an algebra H \ltimes H^* as the algebra structure on H \otimes H^* determined by its action on H^*.  Namely, H^* acts on itself by right multiplication, and H acts on H^* by the right coregular action.  These are just the duals of the two left actions used to define H \rtimes H^*.

That’s really all I wanted to say here.  But just in case you want the actual formulas for the Heisenberg double and its defining representations, here they are in Sweedler notation:


Hopf algebroids and (quantum) groupoids (Part 2)

8 September 2014

Last time I defined weak Hopf algebras, and claimed that they have groupoid-like structure. Today I’ll make that claim more precise by defining the groupoid algebra of a finite groupoid and showing that it has a natural weak Hopf algebra structure. In fact, we’ll get a functor from finite groupoids to weak Hopf algebras.

First, recall how the group algebra works. If G is a group, its group algebra is simply the vector space spanned by elements of G, and with multiplication extended linearly from G to this vector space. It is an associative algebra and has a unit, the identity element of the group.

If G is a groupoid, we can similarly form the groupoid algebra. This may seem strange at first: you might expect to get only an algebroid of some sort. In particular, whereas for the group algebra we get the multiplication by linearly extending the group multiplication, a groupoid has only a partially defined “multiplication”—if the source of g differs from the target of h, then the composite gh is undefined.

However, upon “linearizing”, instead of saying gh is undefined, we can simply say it’s zero unless s(g)=t(h). This is essentially all there is to the groupoid algebra.  The groupoid algebra \mathbb{C}[G] of a groupoid G is the vector space with basis the morphisms of G, with multiplication given on this basis by composition whenever this is defined, zero when undefined, and extended linearly from there.

It’s easy to see that this gives an associative algebra: the multiplication is linear since we define it on a basis and extend linearly, and it’s associative since the group multiplication is. It is a unital algebra if and only if the groupoid has finitely many objects, and in this case the unit is the sum of all of the identity morphisms.

Mainly to avoid saying “groupoids with finitely many morphisms”, I’ll just stick to finite groupoids from now on, where the sets of objects and morphisms are both finite.

If we have a groupoid homomorphism, then we get an algebra homomorphism between the corresponding groupoid algebras, also by linear extension. So we get a functor

\mathbb{C}[\cdot]\colon\mathbf{FinGpd} \to \mathbf{Alg}

from the category of finite groupoids to the category of unital algebras.

But in fact, this extends canonically to a functor

\mathbb{C}[\cdot]\colon\mathbf{FinGpd} \to \mathbf{WHopf}

from the category of finite groupoids to the category of weak Hopf algebras.

To see how this works, notice first that there’s a canonical functor

\mathbb{C}[\cdot]\colon\mathbf{Set} \to \mathbf{Coalg}

from the category of sets to the category of coalgebras:  Every set is a comonoid in a unique way, so we just linearly extend that comonoid structure to a coalgebra.

In case that’s not clear to you, here’s what I mean in detail.  Given a set X, there is a unique map \Delta\colon X \to X \times X that is coassociative, namely the diagonal map \Delta(x) = (x,x). This is easy to prove, so do it if you never have.  Also, there is a unique map to the one-element set \epsilon\colon X \to \{0\}, up the choice of which one-element set to use.  Linearly extending \Delta and \epsilon, they become a coalgebra structure on the vector space with basis X. Moreover, any function between sets is a homomorphism of comonoids, and its linear extension to the free vector spaces on these sets is thus a homomorphism of coalgebras.  This gives us our functor from sets to coalgebras.

So, given a finite groupoid, the vector space spanned by its morphisms becomes both an algebra and a coalgebra.  An obvious question is: do the algebra and coalgebra structure obey some sort of compatibility relations?  The answer, as I already gave away at the beginning, is that they form a weak Hopf algebra.  The antipode is just the linear extension of the inversion map g \mapsto g^{-1}.

(More generally, for those who care, the category algebra \mathbb{C}[C] of a finite category C (or any category with finitely many objects) is a weak bialgebra, and we actually get a functor

\mathbb{C}[\cdot] \colon \mathbf{FinCat} \to \mathbf{WBialg}

from finite categories to weak bialgebras.  If C happens to be a groupoid, \mathbb{C}[C] is a weak Hopf algebra; if it happens to be a monoid, \mathbb{C}[C] is a bialgebra; and if it happens to be a group, \mathbb{C}[C] is a Hopf algebra. )

This is nice, but have we squashed out all of the lovely “oid”-iness from our groupoid when we form the groupoid algebra? In other words, having built a weak Hopf algebra on the vector space spanned by morphisms, is there any remnant of the original distinction between objects and morphisms? 

As I indicated last time, the key is in these two “loop” diagrams:


The left loop says to comultiply the identity, multiply the first part of this with an element g and apply the counit. Let’s do this for a groupoid algebra, where 1 = \sum_x 1_x, where the sum runs over all objects x.  Since comultiplication duplicates basis elements, we get

\Delta(1) = \sum_x 1_x \otimes 1_x

We then get:

g\mapsto \sum_x \epsilon(1_x\cdot g) \otimes 1_x = 1_{t(g)}

using the definition of multiplication and the counit in the groupoid algebra.  Similarly, the loop going in the other direction gives g \mapsto 1_{s(g)} , as anticipated last time. 

So, we can see that the image of either of the two “loop” diagrams is the subspace spanned by the identity morphisms.  This is a commutative subalgebra of the groupoid algebra, and these maps are both idempotent algebra homomorphisms.  So, they give “projections” onto the “algebra of objects”.   

In fact, something like this happens in the case of a more general weak Hopf algebra.  The maps described by the “loop” diagrams, are again idempotent homomorphisms and we can think of them as analogs of the source and target maps.  But there are some differences, too.  For instance, their images need not be the same in general, though they are isomorphic.  The images also don’t need to be commutative.  This starts hinting at what Hopf algebroids are like. 

But I’ll get into that later.


Hopf algebroids and (quantum) groupoids (Part 1)

1 September 2014

I’ve been thinking a lot about weak Hopf algebras and Hopf algebroids, especially in relation to work I’m doing with Catherine Meusburger on applications of them to gauge theory.  I don’t want to talk here yet about what we’re working on, but I do feel like explaining some basic ideas.  This is all known material, but it’s fun stuff and deserves to be better known.

First of all, as you might guess, the “-oid” in “Hopf algebroid” is supposed to be like the “-oid” in “groupoid”.  Groupoids are a modern way to study symmetry, and they do things that ordinary groups don’t do very well.  If you’re not already convinced groupoids are cool—or if you don’t know what they are—then one good place to start is with Alan Weinstein’s explanation of them here:

There are two equivalent definitions of groupoid, an algebraic definition and a categorical definition.  I’ll use mainly categorical language.  So for me, a groupoid is a small category in which all morphisms are invertible.  A group is then just a groupoid with exactly one object.

Once you’ve given up the attachment to the special case of groups and learned to love groupoids, it seems obvious you should also give up the attachment to Hopf algebras and learn to love Hopf algebroids.  That’s one thing I’ve been doing lately.

My main goal in these posts will be to explain what Hopf algebroids are, and how they’re analogous to groupoids.  I’ll build up to this slowly, though, without even giving the definition of Hopf algebroid at first.  Of course, if you’re eager to see it, you can always cheat and look up the definition here:

but I’ll warn you that the relationship to groupoids might not be obvious at first.  At least, it wasn’t to me.  In fact, going from Hopf algebras to Hopf algebroids took considerable work, and some time to settle on the correct definition. But the definition of Hopf algebroid given here in Böhm’s paper seems to be the one left standing after the dust settled.  This review article also includes a brief summary of the development of the subject. 

To work up to Hopf algebroids, I’ll start with something simpler: weak Hopf algebras. These are a rather mild generalization of Hopf algebras, and the definition doesn’t look immediately “oid”-ish. But in fact, they are a nice compromise between between Hopf algebras and Hopf algebroids.  In particular, as we’ll see, just as a group has a Hopf algebra structure on its groupoid algebra, a groupoid has a weak Hopf algebra structure on its groupoid algebra.

Better yet, any weak Hopf algebra can be turned into a Hopf algebroid, and Hopf algebroids built in this way are rich enough to see many of features of general Hopf algebroids. So, I think this gives a pretty good way to understand Hopf algebroids, which might otherwise seem obscure at first. The strategy will be to start with weak Hopf algebras and consider what “groupoid-like” structure is already present. In fact, to emphasize how well they parallel ordinary groupoids, weak Hopf algebras are sometimes called quantum groupoids:

So, here we go…

What is a Weak Hopf algebra?  This is quick to define using string diagrams.  First, let’s define a weak bialgebra.  Just like a bialgebra, a weak bialgebra is both an associative algebra with unit:

WHA-algand a coassociative coalgebra with counit:

wha-coalg(If the meaning of these diagrams isn’t clear, you can learn about string diagrams in several places on the web, like here or here.) 

Compatibility of multiplication and comultiplication is also just like in an ordinary bialgebra or Hopf algebra:


So the only place where the axioms of a weak bialgebra are “weak” is in the compatibility between unit and comultiplication and between counit and multiplication.  If we define these combinations:WHA-adj

then the remaining axioms of a weak bialgebra can be drawn like this:


The two middle pictures in these equations have not quite been defined yet, but I hope it’s clear what they mean. For example, the diagram in the middle on the top row means either of these:

Screen shot 2014-08-22 at 4.29.47 PMsince these are the same by associativity.

Just as a Hopf algebra is a bialgebra with an antipode, a weak Hopf algebra is a weak bialgebra with an antipode.  The antipode is a linear involution S which I’ll draw like this:WHA-antipode

and it satisfies these axioms:

WHA-s1-3Like in a Hopf algebra, having an antipode isn’t additional structure on a weak Hopf algebra, but just a property: a weak bialgebra either has an antipode or it doesn’t, and if it does, the antipode is unique.  The antipode also has most of the properties you would expect from Hopf algebra theory.

One thing to notice is that the equations defining a weak Hopf algebra are completely self-dual.  This is easy to see from the diagrammatic definition given here, where duality corresponds to a rotation of 180 degrees: rotate all of the defining diagrams and you get the same diagrams back.  Luckily, even the letter S is self-dual.

There’s plenty to say about about weak Hopf algebras themselves, but here I want to concentrate on how they are related to groupoids, and ultimately how they give examples of Hopf algebroids. 

To see the “groupoidiness” of weak Hopf algebras, it helps to start at the bottom: the antipode axioms.  In particular, look at this one:


The left side instructs us to duplicate an element, apply the antipode to the copy on the right, and then multiply the two copies together.  If we do this to an element of a group, where the antipode is the inversion map, we get the identity.  If we do it to a morphism in a groupoid, we get the identity on the target of that morphism. So, in the groupoid world, the left side of this equation is the same as applying the target map, and then turning this back into a morphism by using the map that sends any object to its identity morphism.  That is:

g \mapsto 1_{t(g)}

where t is the map sending each morphism to its target, and 1_x denotes the identity morphism on the object x.  

Likewise, consider the dual of the previous axiom:


In the groupoid world, the left hand side gives the map

g \mapsto 1_{s(g)}

where s denotes the map sending each morphism to its source. 

So… if weak Hopf algebras really are like groupoids, then these two loop diagrams:

WHA-target-source must essentially be graphical representations of the target and source maps.

Of course, I only said if Hopf algebras are like groupoids, and I haven’t yet explained any precise sense in which they are.   But we’re getting there.  Next time, I’ll explain more, including how groupoid algebras give weak Hopf algebras. 

Meanwhile, if you want some fun with string diagrams, think of other things that are true for groupoids, and see if you can show weak Hopf algebra analogs of them using just diagrams.  For example, you can check that the diagrammatic analog of 1_{s(1_{s(g)})}=1_{s(g)} (“the source of the source is the source”) follows from the weak Hopf algebra axioms.  Some others hold make a trivial rephrasing: while the obvious diagrammatic translation of 1_{t(S(g))} = 1_{s(g)} does not hold,  if you draw it instead starting from the equation 1_{t(S(g))} = S(1_{s(g)}), then you get an equation that holds in any weak Hopf algebra. 

Blog post on Observer Space by Jeffrey Morton

17 October 2012

DW: This is a very nice blog post by Jeffrey Morton about observer space! He wrote this based on my ILQGS talk and my papers with Steffen Gielen. (In fact, Jeff has written a lot of other nice summaries of papers and talks, as well as stuff about his own research, on his blog, Theoretical Atlas — check it out!)

Theoretical Atlas

This entry is a by-special-request blog, which Derek Wise invited me to write for the blog associated with the International Loop Quantum Gravity Seminar, and it will appear over there as well.  The ILQGS is a long-running regular seminar which runs as a teleconference, with people joining in from various countries, on various topics which are more or less closely related to Loop Quantum Gravity and the interests of people who work on it.  The custom is that when someone gives a talk, someone else writes up a description of the talk for the ILQGS blog, and Derek invited me to write up a description of his talk.  The audio file of the talk itself is available in .aiff and .wav formats, and the slides are here.

The talk that Derek gave was based on a project of his and Steffen Gielen’s, which has taken written form in a…

View original post 2,961 more words

Göttingen talk on Cartan geometrodynamics

28 February 2012

I’m in Göttingen now, at a meeting of the German Physical Society (DPG). Here are the slides to the talk I just gave this evening:

D. Wise, Cartan geometrodynamics, Göttingen, Feb. 2012.

My goal in this talk was simply to present the geometric ideas behind my recent work with Steffen Gielen, without going into the details of the resulting reformulation of general relativity.

Of course, if this piques your interest to that point that you want to learn those details, I recommend our paper. :-)