This experiment is intended to help us better understand the nature of Dark Energy, which drives the accelerated expansion of the universe.
Shown here is the Dark Energy Camera, a 520-megapixel digital camera the collaboration used to collect 50TB of data on over 300 million distant galaxies during the last 6 years. It will be exciting to see what findings come from analyzing all those data!
]]>Savitzky-Golay filters are like that. In digital signal processing, they give nice ways to “smooth” a signal, getting rid of small noisy fluctuations while retaining the signal’s overall shape. There are two ways to think about how they work—one based on convolution and another based on approximation by polynomials. It’s not obvious at first why these give the same result, but working out the correspondence is a fun and enlightening exercise.
From the name Savitzky-Golay filter, you might well expect one way to think about Savitzky-Golay smoothing is as a convolutional filter. Indeed, one general method of smoothing a function is to “smear” it with another function, the “filter,” effectively averaging the function’s value at each point with neighboring values. In this case, the filter function is given by “Savitzky-Golay coefficients.” You could look these up, or get them from your math software, but they may look rather mysterious until you work out where they came from. Why these particular coefficients?
The other way to think of Savitzky-Golay smoothing goes as follows. Suppose you have a digital signal, which for present purposes is just a continuous stream of real numbers:
Now for each , fit a polynomial to on the points between and , and replace by the value of this polynomial at . This approach involves constructing a new polynomial fit at each point: always a polynomial of same degree, but fit to a different set of points.
Again, these two descriptions are equivalent. The polynomial fit version is a lot more satisfying, in that I can visualize how the smoothing is working, and understand intuitively why this is a good smoothing method. On the other hand, you wouldn’t want to calculate it using this method, solving a least-squares polynomial fit problem at each step. For calculations, convolution is much faster.
Why are these equivalent?
The key point that makes doing separate least-squares problems at each point equivalent to doing a simple convolution is that we are dealing with digital signals, with values arriving at regularly spaced intervals. While the dependent variable of the signal could have any crazy behavior you might imagine, the regularity of the independent variable makes most of the least squares calculation the same at each step, and this is what makes the whole process equivalent to a convolution.
Here’s the correspondence in detail.
First, consider how we construct . This is based on a least squares fit to the points
Here’s a quick review of how this works. The idea is to find coefficients such that this polynomial of degree d:
gives the best possible predictions for the values of $f(x)$ in that range. If , then there will be a polynomial whose graph passes perfectly through all points, but in order to have a smoothing effect, we want . The problem is then to minimize the total squared error:
We can write this compactly using matrices. First, form the design matrix , a matrix with rows given by
for
For example, if k = 4, and d = 2, will have these entries:
1 -4 16 -64 1 -3 9 -27 1 -2 4 -8 1 -1 1 -1 1 0 0 0 1 1 1 1 1 2 4 8 1 3 9 27 1 4 16 64
Notice that the matrix product , where , then gives a column vector of values for the polynomial :
.
Those are the “predicted” values for , so if we let be the column vector of actual values, then our problem is to find a vector that minimizes
.
The gradient
is zero if and only if the normal equation
holds. When is invertible, which is always the case if $d<2k$, this has solution
This gives the coefficients . Now, what about the value of ?
Our smoothed version of , agrees at 0 with our polynomial approximation . Since we are evaluating the polynomial at zero, we get just the degree-zero term. In other words, is just the topmost entry in the above solution. That is,
where is the first standard basis vector of .
We’re almost done! Notice that the previous equation just says the value of the smoothed function at is a certain linear combination of the values of in a window of length around 0.
More importantly, if we were to go through the same process to construct for other values of , it is clear that, by a simple shift of coordinates, we would get a linear combination with the same coefficients as above.
And that’s convolution! The general formula for the smoothing is thus
.
Returning to the example I gave above, with , calculating gives approximately
-0.09091 0.06061 0.16883 0.23377 0.25541 0.23377 0.16883 0.06061 -0.09091 0.07239 -0.11953 -0.16246 -0.10606 0.00000 0.10606 0.16246 0.11953 -0.07239 0.03030 0.00758 -0.00866 -0.01840 -0.02165 -0.01840 -0.00866 0.00758 0.03030 -0.01178 0.00589 0.01094 0.00758 0.00000 -0.00758 -0.01094 -0.00589 0.01178
so the Savitzky-Golay filter in this case is just the first row:
-0.09091 0.06061 0.16883 0.23377 0.25541 0.23377 0.16883 0.06061 -0.09091
Convolving any signal with this filter has the effect of replacing the value at each point with the value of the best quadratic polynomial fit to the values at that point and its eight nearest neighbors.
]]>In particular, we made a precise measurement of how often pairs of “up” quarks are produced, relative to how often “down” quarks are produced in the nucleon sea, and showed that there was a statistically significant difference which particle physics theory could not account for.
While I went on to more mathematical and theoretical work, I have colleagues who continued research along these lines, and it’s still nice to look in and see what’s going on in the area I started out in.
Fortunately for me, there’s a nice new review article on the subject of quark-antiquark pairs in the nucleon, written by two of my senior colleagues from E866, Don Geesaman and Paul Reimer:
D. F. Geesaman and P. E. Reimer, The Sea of Quarks and Antiquarks in the Nucleon: a Review, https://arxiv.org/abs/1812.10372
It’s nicely written, and explains the state of the art in sea quark physics from both the experimental and theoretical sides.
]]>Recently, I’ve been learning AI, and really enjoying digging into both the mathematical aspects of intelligence and the concrete programming skills for building state-of-the-art systems. One big help for this has been a two-term Udacity “Nanodegree” that I just completed.
The whole program, which took from January through August 2018, consisted of three parts:
These courses covered a lot of stuff! For me, it was a good way to boost my skills quickly and help me catch up on developments and literature in AI. Some people who are also interested in AI have been asking me about this program, so here’s more about what I did in it, and some evaluation of the Udacity courses.
A good thing about the Udacity courses is that they’re project-focused. The required projects for my Nanodegree included:
I was only required to do one of the three possible capstone projects, and chose to do the one associcated with NLP.
Most of the projects were well designed, and introduced a variety of tools. The programming was all in Python. Some projects were stand-alone Python code built in an IDE, while others were done using Jupyter notebooks. Some of the code was written for the student, leaving the key pieces for the student to do, and there were sometimes optional parts covering more advanced ideas. A few projects also involved written reports, explaining the design choices in the project and/or reviewing some relevant related literature.
My main complaints with the projects: (1) In a few of the projects, I felt like some of the interesting coding that should have been left to the student was pre-coded, and (2) I could sometimes see better methods than what was specified in the project, but had to do it their way to meet specifications and/or pass the automated grader. Still, overall, the projects helped me learn what I wanted to, and I could make my own private versions where I could do things my own way.
The course was also filled with other chances for practical experience: multiple optional (self-evaluated) projects, “mini-projects”, “labs”, and other coding exercises and guidance for implementing a range of AI tools.
The mathematics in the nanodegree was sometimes covered a bit superficially. That’s probably to be expected, since most of the people who take these courses come from a software engineering background, not from the mathematical sciences. Some mathematical friends have asked me whether I can recommend the program for people with more math background, so I’d like to say a bit about the mathematical level.
The first thing to realize is that AI is a rather mathematically intensive area of computer science. To do well in the course, you need at least a solid grasp of linear algebra, logic, and basic probability and statistics, or sufficient mathematical sophistication to learn these subjects quickly. These are the mathematical workhorses of machine learning and AI, and mathophiles will be glad to know Udacity’s courses don’t shy away from expecting students know this stuff.
However, the more math you know, the more you can get out of the course. One can approach AI at wide range of mathematical levels. Though the course is targeted at an undergraduate math level, this was fine: Rather than teaching the more mathematical aspects, which I would have surely been too critical of anyway, the courses mostly helped point me to resources to where I could learn what I wanted to, and gave enough of the mathematical ideas that I could fill in the details myself. I had a lot of fun seeing unexpected connections to areas of math I’ve worked on before.
If you’re in the mathematical sciences and interested in getting into AI, the Udacity Nanodegrees are, in my opinion, a reasonable way to do it. You can’t do exactly the program I did, because Udacity has done some restructuring of their Nanodegree programs since I started, and no longer offers the original AI Nanodegree. However, you can do these:
Since I had 4 degrees before (BS, BA, MS, and PhD), and since “nano” is the prefix for , I guess this new “Nanodegree” brings me up to 4.000000001 degrees. But, actually, I think it was worth more than that. It’s at least a Centidegree. :-)
]]>Can I convert a gaming laptop into a deep learning machine?
The answer is Yes, and I’m writing this post in case it’s helpful to someone who wants to do something similar. Also, I’m writing it so I can remember what I did.
If you want to train neural networks, GPUs are an order of magnitude faster than CPUs. I needed a GPU system for my current work, but I also like being mobile, running computations on the go, and without having to rely on pay-by-the-hour cloud services like AWS. I needed a laptop with a decent GPU.
But I also had some concerns. If I’d had the money, I would have gotten a System 76 machine, or some other laptop designed for machine learning, but I had a budget of around $1000. I also suspected that GPUs in many standard laptops are probably severely underclocked for heat reasons. A gaming laptop seemed like a potentially good solution, since gamers need performance.
Unfortunately, I had a hard time finding reliable-sounding information online about whether most gaming laptops play nicely with Linux. The ones I looked at were mostly not on Ubuntu’s certified hardware list, and I had already been burned once: I’d had one recent bad experience with some hardware that crashed frequently with my Ubuntu setup, and this left me slightly nervous about trying to convert a gaming laptop to a deep learning machine.
Eventually, I came up with a solution I’m very happy with. My Dell G7 15 gaming laptop works great for what I needed:
I’ve now been using this machine for a couple of months, and it’s been great for my purposes! I’m happily and quickly training neural networks, even in places where I’ve got no connection to the cloud. True, it’s a bit bulky compared to most laptops, and the power usage means the battery only has typically a couple of hours of charge, but those are really the only downsides.
Below, you can see the steps I used. I wrote these mostly so I can refer back to them easily, but I’d be happy if they’re helpful to anyone else who wants to do the same…
Here’s what I did. To figure out these steps, two articles I found helpful were ones by Sanyam Bhutani and Taylor Denouden.
TroubleShoot>Advanced Options>UEFI Firmware
and Restart again.nouveau.modeset=0
to the end of the line starting with linux
, then press F10 to boot.nouveau.modeset=0
to the end of the line starting with linux
.sudo apt-get update
ubuntu-drivers devices
to show the recommended drivers for the hardwaresudo ubuntu-drivers autoinstall
to get the recommended driverssudo reboot
cd /usr/local/cuda-9.2/bin
sudo ./uninstall_cuda_9.2.pl
to uninstall if necessary)I may end up doing this again eventually, which is one reason I wanted to record all this while I still remember what I did. For one thing, it will be nice when NVIDIA adds Ubuntu 18 support. For another, in retrospect, I perhaps shouldn’t have done the dual boot thing — more than two months later, I’ve never booted into Windows since installing Ubuntu. But I could have anticipated that.
]]>Here’s how it works. Start with an equilateral triangle:
Mark off each edge into thirds, and connect those points with lines using this pattern:
This divides up the triangle into into seven smaller triangles, three equilateral and three isosceles. Remove the three isosceles triangles to get this:
But now, notice that the three triangles you removed can each be cut in half and then taped back together along an edge, so that you get three equilateral triangles. Do that, and place the new equilateral triangles so that they stick out from the sides of the original triangle, and you will get this:
That’s it for step 1!
Now, for step 2, repeat this whole process for each of the 7 equilateral triangles obtained in step 1. Step 3: Do the same for each of the 49 triangles obtained in step 2. And so on. My original picture, at the top of this post, is what you get after step 5.
Notice that each step is area-preserving, so in particular, the total area of all of the black triangles in my original picture is the same as the are of the triangle I started with.
Here’s an animation showing the first five steps in the sequence, and then those same steps backwards, going back to the original triangle:
The reason the picture seems to get darker in the latter steps is that the triangles are drawn with black edges, and eventually there are a lot of edges. Since there’s a limit to how thin the edges can be drawn, eventually, the picture is practically all edges.
The outline of the entire picture is clearly a Koch curve, so we have generated a Koch curve from a triangle. But, what I really love about this construction is that every triangle that occurs at any step in the recursive process also spawns a Koch curve! That’s a lot of Koch curves.
To make this precise, we can assume that triangles at each step are closed subsets of the plane. Admittedly, the “cutting” analogy falls apart slightly here, since two pieces resulting from a “cut” each contain a copy of the edge the cut was made along, but that’s OK. With this closure assumption, each of the Koch curves, one for each triangle formed at any stage in the process, is a subset of the intersection over all steps.
]]>I’ll avoid explaining this one for now, except to say that I generated it starting from a single triangle, and iteratively replacing each triangle by seven new triangles. This is the sixth generation. The construction differs only slightly from my previous Koch snowflake fractal, in which each triangle had six descendants. I really like this new version, because you can see Koch snowflakes showing up in even more (infinitely more!) places than before.
There are also analogs of this for squares and pentagons!
]]>That’s his three-dimensional interpretation of the first few iterations of this design of mine:
What’s fun about Colton’s version is that each new layer of squares is printed a bit taller than the previous layer. I had really only imagined these as two-dimensional objects, so for me it’s really fun to have 3-dimensional models of them to hold and play with! Colton’s idea of adding some depth really adds another … er … dimension to the overall effect:
His work also gives a nice way to illustrate some of the features of these fractals. For example, visually proving that the “inside” and “outside” in my fractals converge to the same shape can be done by printing the same model at different scales. Here are three copies of the same fractal at different scales, each printed with the same number of iterations:
Not only do these nest tightly inside each other, the thickness is also scaled down by the same ratio, so that the upper surfaces of each layer are completely flush.
Colton has been doing this work partly because designing fractals is a great way to learn 3d printing, and he’s now getting some impressively accurate prints. But, I also like some of his earlier rough drafts. For example, in his first attempt with this fractal based on triangles:
there were small gaps between the triangles, which Colton hadn’t intended. But, this gave the piece a sort of rough, edgy look that I like, and it casts shadows like castle battlements:
Colton is still doing nice new work, and we’ll eventually post some more pictures here. But I couldn’t wait to post a preview of some of his stuff!
(Designs and photos © 2018 Colton Baumler and Derek Wise)
]]>This workshop was a lot of fun! I learned a lot, had the chance to talk to people I’ve known for a long time, and meet others I hadn’t managed to connect with before. I was especially excited to find out about some lines of work in progress that build on my work with Catherine Meusburger on Hopf algebra gauge theory.
In fact, our work on this seems to have been an impetus for the workshop, and it was really gratifying to see how other people are beginning to apply our theory, and also work out some interesting examples of it for particular Hopf algebras! I’m anticipating some interesting work coming out in the near future.
Here’s the conference photo; I’m farthest right, and my coauthor, Catherine, is the 11th head from the left, peeking out from the second row:
I gave an introductory talk on the subject of Hopf algebra gauge theory, and you can download the slides from my talk, or even watch the video. Catherine’s talk followed mine, and she showed how Kitaev models are related to Hopf algebra gauge theory in the same way that Turaev-Viro TQFTs are related to Reshetikhin-Turaev TQFTs. Video of her talk is also available. Of course, for more detail on Hopf algebra gauge theory, you can also check out our paper: Hopf algebra gauge theory on a ribbon graph.
I can also recommend watching other talks from the conference, available from the webpage linked to above. This was just the kind of conference I like best, since it brought people from multiple research communities together, in this case including mathematicians and physicists of various sorts as well as mathematical computer scientists. Kitaev models have been a hot topic the past few years, and one reason I think they’re fun is precisely that people from several areas—quantum computation, Hopf algebras, category theory, quantum gravity, quantum foundations, topological quantum field theory, condensed matter physics, and more—are working together. Of course, this probably also helps explain the rather long conference title.
]]>
generates a bunch of copes of the Koch snowflake at different scales:
Similarly, I’ve shown (2) how letting squares reproduce like this:
generates a bunch of copies of a fractal related to the Koch snowflake, but with 8-fold symmetry:
So what about letting pentagons reproduce? For pentagons, an analog of the replication rules above is this:
Each of the 10 cute little pentagon children here is a factor of smaller than its parent, where is the golden ratio.
However, something interesting happens here that didn’t happen with the triangle and square rules. While triangles and squares overlap with their ancestors, pentagons overlap with both their ancestors and their cousins. The trouble is that certain siblings already share faces (I know, the accidental metaphors here are getting troublesome too!), and so siblings’ children have to fight over territory:
In this three-generation pentagon family portrait, you can see that each second generation pentagon has two children that overlap with a cousin.
As we carry this process further, we get additional collisions between second cousins, third cousins, and so on. At five generations of pentagons, we start seeing some interestingly complex behavior develop from these collisions:
There’s a lot of fascinating structure here, and much of it is directly analogous to the 6-fold and 8-fold cases above, but there are also some differences, stemming from the “cousin rivalry” that goes on in pentagon society.
Let’s zoom in to see some collisions near where the two ‘wreaths’ meet on the right side of the picture:
I find the complicated behavior at the collisions quite pretty, but the ordering issues (i.e. which members of a given generation to draw first when they overlap) annoy me somewhat, since they break the otherwise perfect decagonal symmetry of the picture.
If I were doing this for purely artistic purposes, I’d try resolving the drawing order issues to restore as much symmetry as possible. Of course, I could also cheat and restore symmetry completely by not filling in the pentagons, so that you can’t tell which ones I drew first:
It’s cool seeing all the layers at once in this way, and it shows just how complex the overlaps can start getting after a few generations.
Anyway, because of these collisions, we don’t get seem to get a fractal tiling of the plane—at least, not like we got in the previous cases, where the plane simply keeps getting further subdivided into regions that converge to tiles of the same shape at different scales.
Actually, though, we still might get a fractal tiling of the plane, if the total area of overlap of nth generation pentagons shrinks to zero as n goes to infinity! That would be cool. But, I don’t know yet.
In any case, the picture generated by pentagons is in many ways very similar to the pictures generated by triangles and squares. Most importantly, all of the similar-looking octagonal flower shaped regions we see in this picture including the outer perimeter, the inner light-blue region, and tons of smaller ones:
really are converging to the same shape, my proposal for the 10-fold rotationally symmetric analog of the Koch snowflake:
How do we know that all of these shapes are converging to the same fractal, up to rescaling? We can get a nice visual proof by starting with two pentagons, one rotated and scaled down from the other, and then setting our replication algorithm loose on both of them:
Proof:
We see that the area between the two fractal curves in the middle shrinks closer to zero with each generation.
Puzzle for Golden Ratio Fans: What is the exact value of the scaling factor relating the two initial pentagons?
Next up in this infinite series of articles: hexagons! …
I’m joking! But, it’s fairly clear we can keep ascending this ladder to get analogs of the Koch snowflake generated by n-gons, with (2n)-fold rotational symmetry. More nice features might be sacrificed as we go up; in the case generated by hexagons, we’d have collisions not only between cousins, but already between siblings.
]]>