Contents
PageRank Algorithm
Let us understand the Page Rank Algorithm through an example. We use a simplified version of it, which only considers connected graphs. In real life, we could have disjoint sets of websites. Our focus of the algorithm is its linear algebra perspective.
Suppose we have a smaller version of the internet of websites A, B, C, D, E as described in the table below.
Website | Links to Website(s) |
---|---|
A | B, C, D, E |
B | C, D |
C | A, E |
D | A, C, E |
E |
We can visualize the relationship of websites by a directed graph with the websites as nodes, edges from vertex $ u $ to vertex v if website u references website v, and the weights of edges being the “importance” of website u being transferred to website v equally amongst all its links.
For example, as we can see in the graph, since website A has links to all other pages, it has outgoing edges to all other vertices each with weight $ \frac{1}{4} $ (= 0.25).
Step 1: Create a transition matrix.
Create a transition matrix of the graph with entry (i,j) being the importance that website j transfers to website i. The transition matrix A for this graph would be as follows:
$ \begin{bmatrix} 0 & 0 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & 0 & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & 0 & \frac{1}{2} & \frac{1}{3} & 0 \end{bmatrix} $
If there is a website that does not have any outgoing edges, then we distribute its importance to each other website equally. This avoids problems where the resulting PageRank vector becomes 0.
Step 2: Calculate the PageRank vector.
There’s two ways to do this:
Way 1: Iterative version
The PageRank algorithm is an iterative algorithm as described in the official published paper.
Start with an initial rank vector v, with all entries equal to 1/n, where n is the number of websites. Each incoming link should increase the importance of a website, so we update its rank by adding the importance of the incoming links to it. This is the same as multiplying matrix A with v.
This is because in matrix multiplication of a matrix with a vector, we essentially take a dot product each row to the vector. In the context of this problem, for example, we would be multiplying the importance each website gives to website A in row 1 to its rank, which is the first entry of the vector.
Update the rank vector by repeatedly multiplying matrix A to the new rank vector after each step until it converges. Essentially, we are multiplying A by v in iteration 1, A by Av in iteration 2, …, A by A^(n-1)v in iteration n, until the rank vector converges. This converged rank vector is called the PageRank vector.
Iteration 1:
$ \begin{bmatrix} 0.21666... \\ 0.1 \\ 0.26666... \\ 0.2 \\ 0.21666... \end{bmatrix} $
Iteration 2:
$ \begin{bmatrix} 0.25416... \\ 0.10833... \\ 0.225 \\ 0.15833... \\ 0.25416... \end{bmatrix} $
Iteration 3:
$ \begin{bmatrix} 0.22881... \\ 0.12708... \\ 0.23402... \\ 0.18125 \\ 0.22881... \end{bmatrix} $
Iteration 4:
$ \begin{bmatrix} 0.23463... \\ 0.11440... \\ 0.23836... \\ 0.17795... \\ 0.23463... \end{bmatrix} $
Iteration 5:
$ \begin{bmatrix} 0.23716... \\ 0.11731... \\ 0.23383... \\ 0.17452... \\ 0.23716... \end{bmatrix} $
Iteration 6:
$ \begin{bmatrix} 0.23438... \\ 0.11858... \\ 0.23541... \\ 0.17362... \\ 0.23438... \end{bmatrix} $
Iteration 7:
$ \begin{bmatrix} 0.23538... \\ 0.11719... \\ 0.23556... \\ 0.17648... \\ 0.23538... \end{bmatrix} $
Iteration 8:
$ \begin{bmatrix} 0.23545... \\ 0.11769... \\ 0.23511... \\ 0.17628... \\ 0.23545... \end{bmatrix} $
As we can see, after 8 iterations, the thousandth place has converged.
The PageRank vector is:
$ \begin{bmatrix} 0.235 \\ 0.117 \\ 0.235 \\ 0.176 \\ 0.235 \end{bmatrix} $
This is an eigenvector of A with eigenvalue 1. It always works out this way. We will compute this directly in Way 2. Later we will explain why this works.
Way 2: Non-iterative equivalent
Let us call the importance of each of the five websites as $ x_1, …, x_5 $. Then, we get the following system of equations:
$ x_1 = \frac{1}{2} \cdot x_3 + \frac{1}{3} \cdot x_4 + \frac{1}{4} \cdot x_5 \\ x_2 = \frac{1}{4} \cdot x_1 + \frac{1}{4} \cdot x_5 \\ x_3 = \frac{1}{4} \cdot x_1 + \frac{1}{2} \cdot x_2 + \frac{1}{3} \cdot x_4 + \frac{1}{4} \cdot x_5 \\ x_4 = \frac{1}{4} \cdot x_1 + \frac{1}{2} \cdot x_2 + \frac{1}{4} \cdot x_5 \\ x_5 = \frac{1}{4} \cdot x_1 + \frac{1}{2} \cdot x_3 + \frac{1}{3} \cdot x_4 $
Each equation comes from the idea that the importance of a website is the sum of the importance of the incoming links to it.
This system of equations is equivalent to asking for the solution to:
$ \begin{bmatrix} 0 & 0 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & 0 & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & 0 & \frac{1}{2} & \frac{1}{3} & 0 \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{bmatrix} $
Clearly, we want to find the eigenvector corresponding to the eigenvalue of 1. Solving for it, we get the following form for the eigenvectors:
$ c \cdot \begin{bmatrix} 1 \\ \frac{1}{2} \\ 1 \\ \frac{3}{4} \\ 1 \end{bmatrix} $
The PageRank vector denotes the relative importance of the websites, so we would want all entries to sum to 1. Eigenvectors are all scalar multiples of each other, so we can normalize it by dividing each entry by $ \frac{17}{4} (= 1 + \frac{1}{2} + 1 + \frac{3}{4} + 1) $. This gives us the PageRank vector:
$ \frac{4}{17} \cdot \begin{bmatrix} 1 \\ \frac{1}{2} \\ 1 \\ \frac{3}{4} \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{4}{17} \\ \frac{2}{17} \\ \frac{4}{17} \\ \frac{3}{17} \\ \frac{4}{17} \end{bmatrix} = \begin{bmatrix} 0.23529... \\ 0.11764... \\ 0.23529... \\ 0.17647... \\ 0.23529... \end{bmatrix} \approx \begin{bmatrix} 0.235 \\ 0.117 \\ 0.235 \\ 0.176 \\ 0.235 \end{bmatrix} $
As we can see, this is the same vector calculated by Way 1.
Now that we have seen the non-iterative equivalent, we can explain why we can always arrive to an eigenvector of eigenvalue 1.
A stochastic matrix is a square matrix in which all entries are non-negative. It is called a right stochastic matrix when all the columns sum to 1. Clearly, the transition matrix A is a stochastic matrix.
A property of a stochastic matrix is that it always has an eigenvalue of 1.
Proof:
Let A be a right stochastic matrix:
$ \begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{n1} & \cdots & A_{nn} \\ \end{bmatrix} $
Then, matrix A - I is
$ \begin{bmatrix} A_{11}-1 & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{n1} & \cdots & A_{nn}-1 \\ \end{bmatrix} $
Each column must sum to 0 now, since each column has an additional -1 term.
This means
$ (A-I)^T \cdot \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} = 0 \cdot \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} $
This means that matrix A-I has an eigenvalue of 0. This means that the matrix is singular, i.e. it does not have an inverse, and its rows/columns are not linearly dependent, so there is a non-zero vector v in its null space which is the eigenvector of A-I with eigenvalue 0:
$ (A-I) \cdot v = 0 \cdot v \implies A \cdot v - I \cdot v = 0 \implies A \cdot v = I \cdot v \implies A \cdot v = v \implies A \cdot v = 1 \cdot v $
Therefore, matrix A must always have an eigenvalue of 1 with eigenvector v.
This is the first version of the algorithm described in the paper. Since then, it has gone through many changes which are not available publicly.