Recently I wrote a couple posts about eigenvectors and eigenvalues. I thought it would be cool to go from that slightly more theoretical material and show something super useful in which eigenvectors / eigenvalues are an integral part. So today, I’m going to talk about Principal Component Analysis. This algorithm is used everywhere, notably in neuroscience and computer graphics. The idea is you have a dataset with a ton of features and you want to reduce it to it’s core components. With high dimensionality, not only is the curse of dimensionality a problem, but you also just can’t visualize the data, which prevents a lot of basic insights. So we want to reduce the dimensionality without losing vital information. This is where PCA comes in. In PCA, we go from a large number of features to a small number of components, which still contain a sufficient proportion of the information. Before we talk about the algorithm itself, there are a few important math concepts which you must be familiar with in order to proceed. The first is eigenvalues and eigenvectors. You can read about them here. The second is the covariance matrix. Covariance Matrix Covariance is the measure of how two different variables change together. The covariance between two variables, X and Y, can be given by the following formula. $$cov = \frac{\sum\limits_{i=1}^{n} (X_i – \bar{X})(Y_i – \bar{Y})}{(n-1)}$$ Now, if we wanted to look at all the possible covariances in a dataset, we can compute the covariance matrix, which has this form – $$C = \left( \begin{array}{ccc} cov(x,x) & cov(x,y) & cov(x,z) \\ cov(y,x) & cov(y,y) & cov(y,z) \\ cov(z,x) & cov(z,y) & cov(z,z) \end{array} \right)$$ Notice that this matrix will be symmetric \((A = A^T)\), and will have a diagonal of just variances, because \(cov(x, x)\) is the same thing as the variance of x. If you understand the covariance matrix and eigenvalues/vectors, you’re ready to learn about PCA. Principal Component Analysis Here is the central idea of PCA – the components are the eigenvectors of the covariance matrix. Moreover, the amount of variance each component captures can be represented by the magnitude of its corresponding eigenvalue. Read that multiple times. So basically, PCA can be represented in 4 pretty simple steps. 1. Normalize the data. We’re dealing with covariance, so it’s a good idea to have features on the same scale. 2. Calculate the covariance matrix. 3. Find the eigenvectors of the covariance matrix. 4. Translate the data to…

Read More## Eigenvectors

This is the second post on eigenvectors and eigenvalues. Here is the first. Before I start though, I want to emphasize something I think a lot of people struggle visualizing—the operation a matrix performs on a vector. Multiplying a matrix by a vector When you multiply a vector x by a matrix A, a few different things can happen. The first of which is a change of dimension. If \(A\) is an \(m \times n\) matrix, and \(x\) is an \(n \times 1\) vector (notice how the \(n\)-dimension has to line up), then the vector \(b\) given by \(Ax = b\) will be \(\textbf{m} \times 1\). So unless \(A\) is a square \(n \times n\) matrix, then the resulting vector won’t be in the same dimension as \(x\) (this hints that only square matrices have eigenvalues…why?). Here’s a concrete example: $$\begin{bmatrix} 1 & 2 \\ 3 & 7 \\ 4 & 9 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} (1 \cdot 1) + (2 \cdot 3) \\ (3 \cdot 1) + (7 \cdot 3) \\ (4 \cdot 1) + (9 \cdot 3) \end{bmatrix} = \begin{bmatrix} 7 \\ 24 \\ 31 \end{bmatrix} $$ Recall that when we multiply a matrices we take the dot product with the rows of the first and the columns of the second. The vector x lies in \(\mathbb{R}^2\), but the result of \(Ax\) lies in \(\mathbb{R}^3\). So, applying a matrix \(A\) to a vector \(x\) can do a number of things—change the dimension, rotate it, etc. We also established that if the vector x is an eigenvector of \(A\), then \(A\) is simply scaling it (not changing the direction or dimension, only the length of the vector). You can refer back to the previous article for a more precise definition. I discussed in the previous article how to find the eigenvalues of a matrix A, but not the eigenvectors. That’s what I want to explore here. Furthermore, you’ll see that we need the eigenvalues in order to find the eigenvectors. How to find the eigenvectors of a matrix Recall that we can manipulate the definition of an eigenvector (and eigenvalue) in order to arrive at \((A-\lambda I)x = 0\) . $$\Large{\begin{align} &1.\ Ax = \lambda x \\ &2.\ Ax = \lambda Ix \\ &3.\ Ax – \lambda Ix = 0 \\ &4.\ (A – \lambda I)x = 0 \end{align}}$$ From equation 4, it’s obvious that the matrix \((A – \lambda I)\) will send to vector \(x\) to \(0\).…

Read More## Eigenvalues

I’ve been making an effort to make more time for my blog lately, but it’s difficult in the midst of the school year. So, I decided that, in order to keep my posts relatively frequent, I’ll post some lighter articles in between the more detailed ones, more about ideas then the full intuition and mathematical development of an algorithm. I’ll still work towards the larger, more developed articles as well, but there is no way I can keep them coming out weekly or so (neural networks is next!). So in that spirit, I want to talk about eigenvalues and eigenvectors. They’re used a lot in machine learning, specifically in something called Principal Component Analysis, a data reduction method. I mentioned them briefly in a post I did a while back about Linear Algebra, but I left the math out. It’s time to bring it back up. Note – If you feel you don’t have the basics to understand this article, read my intro to linear algebra article! Intuition If we consider a vector being multiplied by a matrix, it does some geometrical transformation. Maybe it shifts it rotates it by 30 degrees, or changes it’s dimensions. We can look at this mathematically as \(Ax = b\), where \(x\) is the vector we started with, and \(b\) is the vector we now have. (\(x\) is blue, \(b\) is red). There is a special case of this procedure, for certain vectors \(x\),where \(Ax\) yields simply a stretched or shrunk version of \(x\). Geometrically, we can picture it like this. In this case, \(x\) is called an eigenvector. Math Now, how can we represent this mathematically? The vector \(b\) is simply \(x\), but scaled. $$\begin{align} & Ax = b \\ & Ax = \lambda x,\ \text{where } \lambda \in \mathbb{R} \end{align}$$ Note that \(x\) must be nonzero. Here we call \(x\) an eigenvector of the matrix \(A\), and \(\lambda\) an eigenvalue of \(A\). Now that we have a basic understanding, we have to answer a harder question. How do we find the eigenvalues and eigenvectors of \(A\)? Let’s start with the eigenvalues. This gets a bit math heavy. Finding the eigenvalues $$\Large{\begin{align} &1.\ Ax = \lambda x \\ &2.\ Ax = \lambda I x \\ &3.\ Ax – \lambda I x = 0 \\ &4.\ (A – \lambda I) x = 0 \end{align}}$$ We can start with the above operations to make things clearer. Step 1 – We have…

Read More