I’ve been making an effort to make more time for my blog lately, but it’s difficult in the midst of the school year. So, I decided that, in order to keep my posts relatively frequent, I’ll post some lighter articles in between the more detailed ones, more about ideas then the full intuition and mathematical development of an algorithm. I’ll still work towards the larger, more developed articles as well, but there is no way I can keep them coming out weekly or so (neural networks is next!).

So in that spirit, I want to talk about eigenvalues and eigenvectors. They’re used a lot in machine learning, specifically in something called Principal Component Analysis, a data reduction method. I mentioned them briefly in a post I did a while back about Linear Algebra, but I left the math out. It’s time to bring it back up.

Note – If you feel you don’t have the basics to understand this article, read my intro to linear algebra article!

## Intuition

If we consider a vector being multiplied by a matrix, it does some geometrical transformation. Maybe it shifts it rotates it by 30 degrees, or changes it’s dimensions. We can look at this mathematically as \(Ax = b\), where \(x\) is the vector we started with, and \(b\) is the vector we now have. (\(x\) is blue, \(b\) is red).

There is a special case of this procedure, for certain vectors \(x\),where \(Ax\) yields simply a stretched or shrunk version of \(x\). Geometrically, we can picture it like this.

In this case, \(x\) is called an eigenvector.

## Math

Now, how can we represent this mathematically? The vector \(b\) is simply \(x\), but scaled.

$$\begin{align} & Ax = b \\ & Ax = \lambda x,\ \text{where } \lambda \in \mathbb{R} \end{align}$$

Note that \(x\) must be nonzero.

Here we call \(x\) an eigenvector of the matrix \(A\), and \(\lambda\) an eigenvalue of \(A\). Now that we have a basic understanding, we have to answer a harder question. How do we find the eigenvalues and eigenvectors of \(A\)?

Let’s start with the eigenvalues. This gets a bit math heavy.

## Finding the eigenvalues

$$\Large{\begin{align} &1.\ Ax = \lambda x \\ &2.\ Ax = \lambda I x \\ &3.\ Ax – \lambda I x = 0 \\ &4.\ (A – \lambda I) x = 0 \end{align}}$$

We can start with the above operations to make things clearer.

Step 1 – We have this from the definition of an eigenvalue and eigenvector.

Step 2 – Not too crazy, all we do is introduce an identity matrix to multiply \(x\). This is fair because\(x = I x\), where \(I\) is the identity matrix.

Step 3 – we subtract the term on the right to the left side, leaving the right equal to 0.

Step 4 – We factor out an \(x\). Ok, get ready for a bunch of math.

Here is the most difficult part of this article. This whole argument is based on the invertible matrix theorem. We have an expression to work with, equation 4. We know x is a non-zero vector, so \((A-\lambda I)x = 0\) has a non-trivial solution, so the matrix \(A – \lambda I\) is not invertible. Now, because the matrix \(A – \lambda I\) is not invertible, that means it’s determinant is 0! This is how we will solve for the eigenvalues. The equation below is called **the characteristic equation.**

$$\Large{p(\lambda) = det(A – \lambda I) = 0}$$

Let’s do a simple 2×2 example to show that the process isn’t as hard as it sounds.

## Example

$$\text{Find the eigenvalues of the matrix } A =\begin{bmatrix} 2 & 3 \\ 3 & -6 \end{bmatrix}$$

It’s extremely helpful to write out the matrix \(A – \lambda I\).

$$A – \lambda I = \begin{bmatrix} 2 & 3 \\ 3 & -6 \end{bmatrix} – \begin{bmatrix} \lambda & 0 \\ 0 & \lambda \end{bmatrix} = \begin{bmatrix} 2-\lambda & 3 \\ 3 & -6-\lambda \end{bmatrix}$$

And recall that the determinant of a 2×2 matrix is

$$det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad-bc$$

So the determinant of \(A – \lambda I\) is

$$\begin{align} det(A – \lambda I) &= (2-\lambda)(6-\lambda) – (3)(3) \\ &= \lambda^2 + 4\lambda – 21 \\ 0 &= (\lambda – 3)(\lambda + 7) \end{align}$$

Therefore, the eigenvalues of A are \(\lambda = \{3, -7\}\). If you’ve gotten this far, props to you, you math loving nerd :).

I wanted to keep this light, so I think I’ll stop here. I’ll follow this up with a quick post on eigenvectors. Please leave me some feedback, especially on what parts were confusing for you! I really would like to help make some of these math concepts more well known within fields heavily influenced by computer science.

Thanks for reading!!!