Linear Algebra: Matrices

Concept

A matrix is a rectangular array of numbers that represents a linear transformation between vector spaces — it maps vectors to other vectors.

Why This Matters

Every weight in a neural network is stored as a matrix. The forward pass through a dense layer is a matrix multiplication followed by a bias addition and a nonlinearity. In a transformer, attention scores are computed by multiplying a query matrix with a key matrix, and the output is a value matrix multiplied by the attention weights. Understanding matrix multiplication is understanding how information flows through a neural network.


Mathematical Notation

A matrix is written as an uppercase bold or capital letter:

$$ \mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{pmatrix} $$

The dimensions are written as $m \times n$ (read "m by n"): $m$ rows and $n$ columns. We say $\mathbf{A} \in \mathbb{R}^{m \times n}$ — the set of $m \times n$ real matrices (notation introduced in ch01).

The entry in row $i$, column $j$ is written $a_{ij}$ or $\mathbf{A}_{ij}$. In math notation, indices are 1-based; in code they become 0-based.

Matrix Multiplication

For two matrices $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$, their product $\mathbf{C} = \mathbf{A}\mathbf{B}$ is an $m \times p$ matrix where each entry is:

$$ \mathbf{C}_{ij} = \sum_{k=1}^{n} \mathbf{A}_{ik} \mathbf{B}_{kj} $$

The inner dimension ($n$) must match — the number of columns in $\mathbf{A}$ must equal the number of rows in $\mathbf{B}$. This is called the compatibility condition for matrix multiplication.

Each entry $\mathbf{C}_{ij}$ is the dot product (introduced in ch02) of row $i$ of $\mathbf{A}$ with column $j$ of $\mathbf{B}$.

Transpose

The transpose of a matrix flips rows and columns. If $\mathbf{A} \in \mathbb{R}^{m \times n}$, then $\mathbf{A}^T \in \mathbb{R}^{n \times m}$:

$$ (\mathbf{A}^T)_{ij} = \mathbf{A}_{ji} $$

Visually:

$$ \mathbf{A} = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \quad\longrightarrow\quad \mathbf{A}^T = \begin{pmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{pmatrix} $$

Identity Matrix

The identity matrix $\mathbf{I}_n$ is an $n \times n$ matrix with 1s on the diagonal and 0s elsewhere. It acts like the number 1 in multiplication: $\mathbf{A}\mathbf{I} = \mathbf{A}$ and $\mathbf{I}\mathbf{A} = \mathbf{A}$ for any compatible $\mathbf{A}$.

$$ \mathbf{I}_3 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} $$

The identity is written with the Kronecker delta (δ, delta — introduced in ch01):

$$ (\mathbf{I}_n)_{ij} = \delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases} $$


Intuition

As a transformation: A matrix multiplies a vector to produce another vector. If $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^n$, then $\mathbf{y} = \mathbf{A}\mathbf{x} \in \mathbb{R}^m$. The matrix rotates, scales, and shears the input space into the output space.

As a collection of rows: An $m \times n$ matrix is $m$ row vectors, each of dimension $n$. Multiplying $\mathbf{A}$ by a vector $\mathbf{x}$ means taking the dot product of $\mathbf{x}$ with each row — each row says "how much does $\mathbf{x}$ align with me?"

As a collection of columns: Equivalently, an $m \times n$ matrix is $n$ column vectors, each of dimension $m$. The product $\mathbf{A}\mathbf{x}$ is a linear combination of the columns, weighted by the entries of $\mathbf{x}$:

$$ \mathbf{A}\mathbf{x} = x_1 \mathbf{a}_{(:1)} + x_2 \mathbf{a}_{(:2)} + \dots + x_n \mathbf{a}_{(:n)} $$

where $\mathbf{a}_{(:j)}$ is the $j$-th column of $\mathbf{A}$.

Why naive multiplication is slow: Computing each of the $m \times p$ entries requires summing $n$ products — that's $m \times n \times p$ scalar multiplications. For square matrices of size $n$, this is $n^3$ operations. There are faster algorithms (Strassen, Cooley-Tukey for convolution), but the naive algorithm is what you should understand before optimizing.


Rust Implementation

Create a new crate as a sibling to ch02 (it is self-contained and does not depend on earlier chapter code):

cargo new --name ch03 --lib ch03
cd ch03

Replace src/lib.rs with:

/// A matrix stored in row-major order as a flat Vec<f64>.
///
/// Row-major means the first row occupies indices 0..cols,
/// the second row occupies indices cols..2*cols, and so on.
/// The element at (row, col) is at index row * cols + col.
#[derive(Debug, Clone, PartialEq)]
pub struct Matrix {
    data: Vec<f64>,
    rows: usize,
    cols: usize,
}

impl Matrix {
    /// Create a new matrix from a flat Vec<f64> in row-major order.
    ///
    /// # Panics
    /// Panics if data.len() != rows * cols.
    pub fn new(data: Vec<f64>, rows: usize, cols: usize) -> Self {
        assert_eq!(
            data.len(),
            rows * cols,
            "Data length {} does not match dimensions {}×{}",
            data.len(),
            rows,
            cols
        );
        Matrix { data, rows, cols }
    }

    /// Number of rows.
    pub fn rows(&self) -> usize {
        self.rows
    }

    /// Number of columns.
    pub fn cols(&self) -> usize {
        self.cols
    }

    /// Shape as (rows, cols).
    pub fn shape(&self) -> (usize, usize) {
        (self.rows, self.cols)
    }

    /// Access the element at (row, col) — 0-indexed.
    ///
    /// # Panics
    /// Panics if row or col is out of bounds.
    pub fn get(&self, row: usize, col: usize) -> f64 {
        assert!(row < self.rows, "Row {} out of bounds (rows={})", row, self.rows);
        assert!(col < self.cols, "Col {} out of bounds (cols={})", col, self.cols);
        self.data[row * self.cols + col]
    }

    /// Matrix multiplication: self * other.
    ///
    /// For A ∈ ℝ^{m×n} and B ∈ ℝ^{n×p}, the result C ∈ ℝ^{m×p}.
    /// C_{ij} = Σ_{k=0}^{n-1} A_{ik} * B_{kj}
    ///
    /// # Panics
    /// Panics if self.cols != other.rows (incompatible dimensions).
    pub fn multiply(&self, other: &Matrix) -> Matrix {
        assert_eq!(
            self.cols, other.rows,
            "Incompatible dimensions: {}×{} vs {}×{}",
            self.rows, self.cols, other.rows, other.cols
        );

        let m = self.rows;
        let n = self.cols;
        let p = other.cols;

        let mut data = Vec::with_capacity(m * p);

        for i in 0..m {
            for j in 0..p {
                let mut sum = 0.0;
                for k in 0..n {
                    sum += self.get(i, k) * other.get(k, j);
                }
                data.push(sum);
            }
        }

        Matrix { data, rows: m, cols: p }
    }

    /// Transpose the matrix: flip rows and columns.
    ///
    /// If A ∈ ℝ^{m×n}, then A^T ∈ ℝ^{n×m} with (A^T)_{ij} = A_{ji}.
    pub fn transpose(&self) -> Matrix {
        let mut data = Vec::with_capacity(self.rows * self.cols);

        for j in 0..self.cols {
            for i in 0..self.rows {
                data.push(self.get(i, j));
            }
        }

        Matrix {
            data,
            rows: self.cols,
            cols: self.rows,
        }
    }

    /// Create an n×n identity matrix.
    pub fn identity(n: usize) -> Matrix {
        let mut data = vec![0.0; n * n];
        for i in 0..n {
            data[i * n + i] = 1.0;
        }
        Matrix {
            data,
            rows: n,
            cols: n,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_create_and_get() {
        let m = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
        assert_eq!(m.shape(), (2, 3));
        assert!((m.get(0, 0) - 1.0).abs() < 1e-10);
        assert!((m.get(0, 2) - 3.0).abs() < 1e-10);
        assert!((m.get(1, 1) - 5.0).abs() < 1e-10);
    }

    #[test]
    fn test_multiply_2x3_by_3x2() {
        // A = [[1, 2, 3], [4, 5, 6]]
        // B = [[7, 8], [9, 10], [11, 12]]
        // C = A * B should be:
        //   C[0][0] = 1*7 + 2*9 + 3*11 = 7 + 18 + 33 = 58
        //   C[0][1] = 1*8 + 2*10 + 3*12 = 8 + 20 + 36 = 64
        //   C[1][0] = 4*7 + 5*9 + 6*11 = 28 + 45 + 66 = 139
        //   C[1][1] = 4*8 + 5*10 + 6*12 = 32 + 50 + 72 = 154
        let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
        let b = Matrix::new(vec![7.0, 8.0, 9.0, 10.0, 11.0, 12.0], 3, 2);
        let c = a.multiply(&b);

        assert_eq!(c.shape(), (2, 2));
        assert!((c.get(0, 0) - 58.0).abs() < 1e-10);
        assert!((c.get(0, 1) - 64.0).abs() < 1e-10);
        assert!((c.get(1, 0) - 139.0).abs() < 1e-10);
        assert!((c.get(1, 1) - 154.0).abs() < 1e-10);
    }

    #[test]
    fn test_multiply_square() {
        // A = [[1, 2], [3, 4]]
        // B = [[5, 6], [7, 8]]
        // C = [[1*5+2*7, 1*6+2*8], [3*5+4*7, 3*6+4*8]]
        //   = [[5+14, 6+16], [15+28, 18+32]]
        //   = [[19, 22], [43, 50]]
        let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0], 2, 2);
        let b = Matrix::new(vec![5.0, 6.0, 7.0, 8.0], 2, 2);
        let c = a.multiply(&b);

        assert!((c.get(0, 0) - 19.0).abs() < 1e-10);
        assert!((c.get(0, 1) - 22.0).abs() < 1e-10);
        assert!((c.get(1, 0) - 43.0).abs() < 1e-10);
        assert!((c.get(1, 1) - 50.0).abs() < 1e-10);
    }

    #[test]
    fn test_multiply_by_identity() {
        let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
        let i = Matrix::identity(3);
        let result = a.multiply(&i);
        // A * I = A
        for row in 0..2 {
            for col in 0..3 {
                assert!(
                    (result.get(row, col) - a.get(row, col)).abs() < 1e-10,
                    "Mismatch at ({}, {}): expected {}, got {}",
                    row,
                    col,
                    a.get(row, col),
                    result.get(row, col)
                );
            }
        }
    }

    #[test]
    fn test_transpose() {
        // A = [[1, 2, 3], [4, 5, 6]]
        // A^T = [[1, 4], [2, 5], [3, 6]]
        let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
        let t = a.transpose();

        assert_eq!(t.shape(), (3, 2));
        assert!((t.get(0, 0) - 1.0).abs() < 1e-10);
        assert!((t.get(0, 1) - 4.0).abs() < 1e-10);
        assert!((t.get(1, 0) - 2.0).abs() < 1e-10);
        assert!((t.get(1, 1) - 5.0).abs() < 1e-10);
        assert!((t.get(2, 0) - 3.0).abs() < 1e-10);
        assert!((t.get(2, 1) - 6.0).abs() < 1e-10);
    }

    #[test]
    fn test_transpose_twice_is_identity() {
        let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
        let t = a.transpose();
        let back = t.transpose();
        // (A^T)^T = A
        assert_eq!(a.shape(), back.shape());
        for row in 0..2 {
            for col in 0..3 {
                assert!(
                    (back.get(row, col) - a.get(row, col)).abs() < 1e-10,
                    "Mismatch at ({}, {}): expected {}, got {}",
                    row,
                    col,
                    a.get(row, col),
                    back.get(row, col)
                );
            }
        }
    }

    #[test]
    #[should_panic(expected = "Incompatible dimensions")]
    fn test_multiply_incompatible_dimensions() {
        let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0], 2, 2);
        let b = Matrix::new(vec![1.0, 2.0, 3.0], 3, 1); // a.cols=2 != b.rows=3
        a.multiply(&b);
    }
}

Run the tests:

cargo test

You should see:

running 7 tests
test tests::test_create_and_get ... ok
test tests::test_multiply_2x3_by_3x2 ... ok
test tests::test_multiply_by_identity ... ok
test tests::test_multiply_incompatible_dimensions ... ok
test tests::test_multiply_square ... ok
test tests::test_transpose ... ok
test tests::test_transpose_twice_is_identity ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered

Walkthrough


Verification

The tests above verify:

TestWhat it checksMathematical invariant
test_create_and_getElement access and shapeStorage layout correctness
test_multiply_2x3_by_3x2$C = AB$ with known resultMatrix multiplication formula
test_multiply_square$2\times2$ multiplication$C_{ij} = \sum_k A_{ik}B_{kj}$
test_multiply_by_identity$AI = A$Identity matrix property
test_transposeKnown transpose result$(A^T)_{ij} = A_{ji}$
test_transpose_twice_is_identity$(A^T)^T = A$Involution property of transpose
test_multiply_incompatible_dimensionsProper panic on mismatchDimension compatibility

Key Takeaways