This page contains supplementary material for the paper "Visualizing neuron activations of neural networks with the grand tour".

The math of the grand tour

Suppose every data point has p entries. The grand tour aims at showing us the every perspective of the data points, little by little, as it slightly and continuously rotates the data points in the p-dimensional Euclidean space \mathbb{R}^p. It basically defines a smooth curve in the space of all orthogonal projections on \mathbb{R}^p, so that as we traverse along the curve, the data rotates smoothly, and our 2D view shows different perspectives of the data.

Mathematically, let V_{2,p} denote the set of all orthonormal 2-frames, i.e. ordered pairs of orthonormal vectors (v_1,v_2). Since we want our animated view to be comprehensive and reveals every perspective of the data, we wish to find a curve f: R \to V_{2,p}, such that it is dense in V_{2,p}. We implemented the torus method in the grand tour paper to define such a curve. The torus method of grand tour takes a time variable t \in \mathbb{R}, then construct an N-dimensional vector v \in \mathbb{R}^N, with N=\frac{1}{2}(p^2-p), by

v = \alpha (t) = (\lambda_1 t, \lambda_2 t, \cdots, \lambda_N t)

where each \lambda_i is a random number ranging from [0,2\pi] In order for the curve to be dense in the space, we wish to use a set of real numbers \lambda_i that are independent over the integers. Since this would require the real numbers to be irrational, and floating point representation in computer systems are discrete and rational anyway, for visualization purpose only we relaxed this 'integer independence' in our implementaion and used a set of random numbers instead. They are predetermined and fixed for all t in the domain. Note that we have N random numbers in v and the number N=\frac{1}{2}(p^2-p) equals the number of entries in the upper triangular part of a p-by-p matrix, excluding the diagonal. We can then map this N-vector v onto an element M_{p\times p} in the special orthogonal group in dimension p SO(p), or the set of all orthogoanl matrices with determinent equal to 1, by:

M = f(v) = R_{1,2}(v_1) \circ R_{1,3}(v_2) \cdots R_{p-1,p}(v_N)

where we denoted R_{i,j}(\theta) as an element in SO(p) that rotates the i^{th} standard basis vector e_i toward the j^{th} standard basis vector e_j by angle \theta while fixing the null space of span(\{e_i, e_j\}). Moreover, for any matrix M, there is a natrual projection \pi onto the set of 2-frames V_{2,p}, simply by retaining only the first two rows of M, i.e.:

M_2 = \pi(M) = (M e_1, M e_2)

When we concatenate these transformations, we get a dense curve from \mathbb{R} to V_{2,p}. In summary, for any given time t, we rotate then project every data point by the matrix

M_2^{(t)} := \pi \circ f \circ \alpha(t)

to get an xy-coordinate for each data point. This allows us to create an animated view of different perspectives of data points.

More comparison with other dimensionality reduction techniques