Matrix Review

This series will make extensive use of matrix notations. For a deeper and more insightful presentation, be sure to see the Jack Crenshaw article ^[1].

As far as the mathematics goes, this series will stay as close as possible to introductory level. The point is doing useful things, not building the mathematics. Pretty much any introductory textbook on linear algebra and matrix theory will cover all of these topics properly. However:

If you have been through the "linear algebra" mill and studied matrix theory in the past, but it is a little foggy today, this refresher is probably enough of what you need.
If it all sounds unfamiliar to you, a formal course of study is the best idea, but maybe there is enough information here to get you through.
If this kind of stuff is your daily bread, you should not find anything of interest here. Save yourself some time and skip this review.

Matrix Notation

Models typically require keeping track of multiple variables that are changing at the same time. For linear models of the sort that this series will be using, the equations have a form something like the following, with more or fewer variables, with more or fewer equations that all must be satisfied at the same time.

a w + b x + c y + d z = t

e w + f x + g y + h z = u

i w + j x + k y + l z = v

In these equations, one can draw a distinction between model parameters and model variables. In the example as shown above:

the parameters, represented by a, b, c, ..., l, determine the behaviors of the model
the input variables, changing at each instant, are represented by w, x, y, z
the output variables that result are represented by t, u, v

The model parameters can be considered constant, while the input and output variables will change dynamically over time. We would like a notation that represents this input-to-output relationship compactly, and that clearly distinguishes the roles of the variables and parameters.

To prepare, collect terms into a 2-dimensional table, as shown in Table 1. The calculations can then be performed in a relatively conventional "spreadsheet-like" manner.

Table 1
	_ · w	_ · x	_ · y	_ · z		result
input data	w	x	y	z
eqn 1	a	b	c	d	=	t
eqn 2	e	f	g	h	=	u
eqn 3	i	j	k	l	=	v

Each equation in the system is represented as a row. Inputs are placed in a designated special input data row near the top. Outputs go into a separate column. For evaluating the system of equations, values from each coefficient row are implicitly multiplied by a copy of the values from the special input row. These pairwise products are then added across the row to produce the output variable for the final column. Anybody familiar with spreadsheets would find this easy to implement.

For pure mathematical purposes, however, it is only necessary to represent the numerical values. A matrix is just an abstract layout containing the numbers in rows and columns. An abstract matrix multiply operation implicitly does the same calculations as in the spreadsheet table above.

The tabular layout can be nicely separated into three matrices:

matrix M for the consistent model parameters a through l
column vector X for the list of input values w through z
column vector Y for the list of output values t through v

[\begin{matrix} m_{11} m_{12} m_{13} m_{14} \\ m_{21} m_{22} m_{23} m_{24} \\ m_{31} m_{32} m_{33} m_{34} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{matrix}] = [\begin{matrix} y_{1} \\ y_{2} \\ y_{3} \end{matrix}]

Special case matrices with a single row or single column are called vectors. Just as a single variable can be given a name, an entire matrix or vector can be given a name.

With these conventions, a very compact and equivalent "matrix product" expression describes these same relationships.

M \cdot X = Y

As things stand so far, there is an awkward inconsistency. In the table/spreadsheet form, the input variables are organized as a row, while in a matrix notation the input vector is a column. If you think of the matrix operation as being a kind of "transformation" that takes one list of numbers as an input and produces a new list as an output, the table version has an awkward way of converting an input row into an output column. In order to do a sequence of operations, it is necessary to do a lot of extra "flopping" to counteract the "flipping" that the table operation produces.

The matrix convention for a matrix multiply operation is that the input is taken in column vector form, and the output is produced in column vector form, avoiding most of the excess thrashing. When thinking about the details of how the matrix multiply operation works, you can picture it as as "flipping the column vector on its side" and then performing the operation the way the spreadsheet tabulation does it.

For discussing terms of a matrix, a subscript notation is conventionally used. The subscripts are given in row then column order. For example, M_1,3 could be used to select the term in the first row and third column of matrix M. For the case of a row or column vector, only one subscript is used, and you can deduce from context whether the subscript is for a row or column.

The dimensions of the matrix are the numbers of rows and the numbers of columns in the matrix, hence, the maximum meaningful numbers for indexing the positions within the matrix.

For column vectors X and Y of equal length, or row vectors X and Y of equal length, their dot product denoted X · Y is obtained by multiplying the terms from each vector together pairwise, and then summing all the intermediate product terms. If a dot product turns out to yield a value of zero, the vectors are called orthogonal. If a dot product of a vector with itself turns out to yield a value of one, the vector is called normalized.

What you can and can't do

You can consider a matrix to be a collection of row matrices in a column, or a collection of column matrices in a row, whenever one of these interpretations is useful.

You can add matrices. They must have identical dimensions, and then add them term by term. You can commute the positions of the two matrices in the sum and the result is the same.

For the matrix to multiply a column vector, that "input" vector must have enough row terms to match up with the number of columns in the matrix. If it does, the matrices are compatible. If matrices are incompatible, the matrix product operation is not defined. Sometimes a chain of matrix product operations is applied, but to do this, the matrices must be compatible after each step of the cascaded operation.

For purposes of performing a multiplication operation, a matrix with multiple columns is considered compatible with another matrix if each of its columns has the correct number of rows to perform a matrix-vector multiply operation. In this case, a product of a matrix with another compatible matrix is the same thing as performing the matrix operation using each of the column vectors of the second matrix, in sequence, and then collecting the "output" vectors as columns to form the result matrix.

A peculiar special case (that occurs quite a lot) is the case of a one-column matrix multiplied by a compatible one-row matrix. Verify from the definition that this is allowable! The result is a matrix having the number of rows from the first vector and the number of columns from the second vector.

Matrix multiplication operations in general do not commute. For one thing, this can violate the compatibility requirement. But even when that doesn't happen, in general the results are different.

The multiplication operations are associative. If you have a cascaded sequence of matrix multiply operations, the operations can be performed in any sequence, as long as the order of the terms is not changed.

You can multiply a scalar times a matrix. This special scaling operation is defined as multiplying every term of the matrix by that scalar value. In general, this is not the same as multiplying by a one-row-one-column matrix, so don't be confused.

Rank and Linear Independence

If you take an equation from the original equation system and multiply it on both sides by a non-zero scalar value, you get a different equation — but one that you know has the same solution and is therefore mathematically equivalent. Clearly, you could construct a large number of equivalent equation systems in this way, but the additional equations are redundant, and embody no new information.

If you take two or more of the equations in the original equation system and multiply each one by its own non-zero scalar value on both sides, and then combine the terms by adding on the left side and on the right side, this constructs a new equation that might look very different, but you know that it is redundant, presenting no new information.

A set of equations for which none of the equations can be derived in this fashion from a combination of other equations in the set is called linearly independent. When in matrix form, rows (or columns) of the matrix are called linearly independent when it is not possible to produce the row (or column) as a scaled sum of other rows (or columns) in the matrix.

The maximum number of linearly independent rows or columns you can have in a system of equations, or its equivalent matrix form, is the lesser of the row dimension and the column dimension. The actual number of linearly independent rows/columns in your equation system or matrix is called the rank of the system. If your equation system or matrix has a maximum number of linearly independent rows/columns, it is said to be of full rank. Otherwise, it is of reduced rank.

Some special matrices

A row vector is a special case of a matrix with only one row.
A column vector is a special case of a matrix with only one column.
A square matrix is a matrix having the same number rows and columns. If it is full rank, it is said to be nonsingular, and it is not possible to multiply the matrix and a nonzero vector to get a zero vector as a result.
A transpose matrix of a matrix M is a matrix M^T that is constructed by taking each term m_i,j from the original M matrix and placing that value as term mt_j,i in the transposed matrix. In other words, the rows become columns, the columns become rows. A fact that is sometimes useful is that the transpose of a product is equal to the product of transposed matrices in reverse order.
(M N)^T ↔ N^T M^T
A zero matrix is a matrix having only zero terms. It has theoretical properties similar to "adding zero" or "multiplying by zero" in scalar arithmetic. Computationally, however, it is very useful, because it reserves storage for a matrix, and then you can go back and adjust individual terms later.
A diagonal matrix is like a zero matrix except for having nonzero terms along the main diagonal — the matrix positions where the row index and column index are equal. All other matrix terms are zero.
An identity matrix is a diagonal matrix having 1.0 for every term along the main diagonal. When an identity matrix multiplies a compatible matrix, the output vector is the same as the input vector, analogous to a "multiply by 1" operation in scalar arithmetic.
A positive definite square matrix M is a matrix such that for any nonzero vector x the product x^T · M · x is always positive. Informally, this says that the matrix operation M does not "reverse the direction" of vector x.
An inverse matrix M^-1 of a square matrix M is a matrix such that the product of the two matrices in either order yields an identity matrix. You can think of it as a transformation that "undoes" what the other transformation did. It only exists when the M matrix is full rank.

Simple norms

The idea of a norm is to characterize the "size" of terms in a matrix by a single scalar value. It isn't immediately obvious how to do this, since a matrix consists in general of multiple terms that can vary widely in value. However, we will occasionally use the 2-norm (also called the Euclidean norm in honor of Euclid's famous theorem about the lengths of sides in a right triangle) to describe the "size" of a vector. It is calculated by squaring each vector term, summing these squared values, and then taking the square root of that sum. If the value of the norm turns out to equal 1.0, the vector is called normalized. This norm can be calculated easily using a dot product operation.

Other unpleasantness

There are plenty of other topics associated with matrix theory and numerical computation that this series will avoid. Many. Advanced topics that we can't avoid will be discussed in context.

Too much?

Don't panic if you can't see the point of all of this yet. It will make more sense when these things appear again in context, in the installments to come.

Footnotes:

[1] Yeah, I know that you saw this link in the main article, but you're seeing it again. Jack Crenshaw's article Who Needs Matrices?. Don't be a wimp, read it.

Sidebar: Matrix Review