Linear Algebra: Matrices
Concept
A matrix is a rectangular array of numbers that represents a linear transformation between vector spaces — it maps vectors to other vectors.
Why This Matters
Every weight in a neural network is stored as a matrix. The forward pass through a dense layer is a matrix multiplication followed by a bias addition and a nonlinearity. In a transformer, attention scores are computed by multiplying a query matrix with a key matrix, and the output is a value matrix multiplied by the attention weights. Understanding matrix multiplication is understanding how information flows through a neural network.
Mathematical Notation
A matrix is written as an uppercase bold or capital letter:
$$ \mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{pmatrix} $$
The dimensions are written as $m \times n$ (read "m by n"): $m$ rows and $n$ columns. We say $\mathbf{A} \in \mathbb{R}^{m \times n}$ — the set of $m \times n$ real matrices (notation introduced in ch01).
The entry in row $i$, column $j$ is written $a_{ij}$ or $\mathbf{A}_{ij}$. In math notation, indices are 1-based; in code they become 0-based.
Matrix Multiplication
For two matrices $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$, their product $\mathbf{C} = \mathbf{A}\mathbf{B}$ is an $m \times p$ matrix where each entry is:
$$ \mathbf{C}_{ij} = \sum_{k=1}^{n} \mathbf{A}_{ik} \mathbf{B}_{kj} $$
The inner dimension ($n$) must match — the number of columns in $\mathbf{A}$ must equal the number of rows in $\mathbf{B}$. This is called the compatibility condition for matrix multiplication.
Each entry $\mathbf{C}_{ij}$ is the dot product (introduced in ch02) of row $i$ of $\mathbf{A}$ with column $j$ of $\mathbf{B}$.
Transpose
The transpose of a matrix flips rows and columns. If $\mathbf{A} \in \mathbb{R}^{m \times n}$, then $\mathbf{A}^T \in \mathbb{R}^{n \times m}$:
$$ (\mathbf{A}^T)_{ij} = \mathbf{A}_{ji} $$
Visually:
$$ \mathbf{A} = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \quad\longrightarrow\quad \mathbf{A}^T = \begin{pmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{pmatrix} $$
Identity Matrix
The identity matrix $\mathbf{I}_n$ is an $n \times n$ matrix with 1s on the diagonal and 0s elsewhere. It acts like the number 1 in multiplication: $\mathbf{A}\mathbf{I} = \mathbf{A}$ and $\mathbf{I}\mathbf{A} = \mathbf{A}$ for any compatible $\mathbf{A}$.
$$ \mathbf{I}_3 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} $$
The identity is written with the Kronecker delta (δ, delta — introduced in ch01):
$$ (\mathbf{I}_n)_{ij} = \delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases} $$
Intuition
As a transformation: A matrix multiplies a vector to produce another vector. If $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^n$, then $\mathbf{y} = \mathbf{A}\mathbf{x} \in \mathbb{R}^m$. The matrix rotates, scales, and shears the input space into the output space.
As a collection of rows: An $m \times n$ matrix is $m$ row vectors, each of dimension $n$. Multiplying $\mathbf{A}$ by a vector $\mathbf{x}$ means taking the dot product of $\mathbf{x}$ with each row — each row says "how much does $\mathbf{x}$ align with me?"
As a collection of columns: Equivalently, an $m \times n$ matrix is $n$ column vectors, each of dimension $m$. The product $\mathbf{A}\mathbf{x}$ is a linear combination of the columns, weighted by the entries of $\mathbf{x}$:
$$ \mathbf{A}\mathbf{x} = x_1 \mathbf{a}_{(:1)} + x_2 \mathbf{a}_{(:2)} + \dots + x_n \mathbf{a}_{(:n)} $$
where $\mathbf{a}_{(:j)}$ is the $j$-th column of $\mathbf{A}$.
Why naive multiplication is slow: Computing each of the $m \times p$ entries requires summing $n$ products — that's $m \times n \times p$ scalar multiplications. For square matrices of size $n$, this is $n^3$ operations. There are faster algorithms (Strassen, Cooley-Tukey for convolution), but the naive algorithm is what you should understand before optimizing.
Rust Implementation
Create a new crate as a sibling to ch02 (it is self-contained and does not depend on earlier chapter code):
cargo new --name ch03 --lib ch03
cd ch03
Replace src/lib.rs with:
/// A matrix stored in row-major order as a flat Vec<f64>.
///
/// Row-major means the first row occupies indices 0..cols,
/// the second row occupies indices cols..2*cols, and so on.
/// The element at (row, col) is at index row * cols + col.
#[derive(Debug, Clone, PartialEq)]
pub struct Matrix {
data: Vec<f64>,
rows: usize,
cols: usize,
}
impl Matrix {
/// Create a new matrix from a flat Vec<f64> in row-major order.
///
/// # Panics
/// Panics if data.len() != rows * cols.
pub fn new(data: Vec<f64>, rows: usize, cols: usize) -> Self {
assert_eq!(
data.len(),
rows * cols,
"Data length {} does not match dimensions {}×{}",
data.len(),
rows,
cols
);
Matrix { data, rows, cols }
}
/// Number of rows.
pub fn rows(&self) -> usize {
self.rows
}
/// Number of columns.
pub fn cols(&self) -> usize {
self.cols
}
/// Shape as (rows, cols).
pub fn shape(&self) -> (usize, usize) {
(self.rows, self.cols)
}
/// Access the element at (row, col) — 0-indexed.
///
/// # Panics
/// Panics if row or col is out of bounds.
pub fn get(&self, row: usize, col: usize) -> f64 {
assert!(row < self.rows, "Row {} out of bounds (rows={})", row, self.rows);
assert!(col < self.cols, "Col {} out of bounds (cols={})", col, self.cols);
self.data[row * self.cols + col]
}
/// Matrix multiplication: self * other.
///
/// For A ∈ ℝ^{m×n} and B ∈ ℝ^{n×p}, the result C ∈ ℝ^{m×p}.
/// C_{ij} = Σ_{k=0}^{n-1} A_{ik} * B_{kj}
///
/// # Panics
/// Panics if self.cols != other.rows (incompatible dimensions).
pub fn multiply(&self, other: &Matrix) -> Matrix {
assert_eq!(
self.cols, other.rows,
"Incompatible dimensions: {}×{} vs {}×{}",
self.rows, self.cols, other.rows, other.cols
);
let m = self.rows;
let n = self.cols;
let p = other.cols;
let mut data = Vec::with_capacity(m * p);
for i in 0..m {
for j in 0..p {
let mut sum = 0.0;
for k in 0..n {
sum += self.get(i, k) * other.get(k, j);
}
data.push(sum);
}
}
Matrix { data, rows: m, cols: p }
}
/// Transpose the matrix: flip rows and columns.
///
/// If A ∈ ℝ^{m×n}, then A^T ∈ ℝ^{n×m} with (A^T)_{ij} = A_{ji}.
pub fn transpose(&self) -> Matrix {
let mut data = Vec::with_capacity(self.rows * self.cols);
for j in 0..self.cols {
for i in 0..self.rows {
data.push(self.get(i, j));
}
}
Matrix {
data,
rows: self.cols,
cols: self.rows,
}
}
/// Create an n×n identity matrix.
pub fn identity(n: usize) -> Matrix {
let mut data = vec![0.0; n * n];
for i in 0..n {
data[i * n + i] = 1.0;
}
Matrix {
data,
rows: n,
cols: n,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_create_and_get() {
let m = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
assert_eq!(m.shape(), (2, 3));
assert!((m.get(0, 0) - 1.0).abs() < 1e-10);
assert!((m.get(0, 2) - 3.0).abs() < 1e-10);
assert!((m.get(1, 1) - 5.0).abs() < 1e-10);
}
#[test]
fn test_multiply_2x3_by_3x2() {
// A = [[1, 2, 3], [4, 5, 6]]
// B = [[7, 8], [9, 10], [11, 12]]
// C = A * B should be:
// C[0][0] = 1*7 + 2*9 + 3*11 = 7 + 18 + 33 = 58
// C[0][1] = 1*8 + 2*10 + 3*12 = 8 + 20 + 36 = 64
// C[1][0] = 4*7 + 5*9 + 6*11 = 28 + 45 + 66 = 139
// C[1][1] = 4*8 + 5*10 + 6*12 = 32 + 50 + 72 = 154
let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
let b = Matrix::new(vec![7.0, 8.0, 9.0, 10.0, 11.0, 12.0], 3, 2);
let c = a.multiply(&b);
assert_eq!(c.shape(), (2, 2));
assert!((c.get(0, 0) - 58.0).abs() < 1e-10);
assert!((c.get(0, 1) - 64.0).abs() < 1e-10);
assert!((c.get(1, 0) - 139.0).abs() < 1e-10);
assert!((c.get(1, 1) - 154.0).abs() < 1e-10);
}
#[test]
fn test_multiply_square() {
// A = [[1, 2], [3, 4]]
// B = [[5, 6], [7, 8]]
// C = [[1*5+2*7, 1*6+2*8], [3*5+4*7, 3*6+4*8]]
// = [[5+14, 6+16], [15+28, 18+32]]
// = [[19, 22], [43, 50]]
let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0], 2, 2);
let b = Matrix::new(vec![5.0, 6.0, 7.0, 8.0], 2, 2);
let c = a.multiply(&b);
assert!((c.get(0, 0) - 19.0).abs() < 1e-10);
assert!((c.get(0, 1) - 22.0).abs() < 1e-10);
assert!((c.get(1, 0) - 43.0).abs() < 1e-10);
assert!((c.get(1, 1) - 50.0).abs() < 1e-10);
}
#[test]
fn test_multiply_by_identity() {
let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
let i = Matrix::identity(3);
let result = a.multiply(&i);
// A * I = A
for row in 0..2 {
for col in 0..3 {
assert!(
(result.get(row, col) - a.get(row, col)).abs() < 1e-10,
"Mismatch at ({}, {}): expected {}, got {}",
row,
col,
a.get(row, col),
result.get(row, col)
);
}
}
}
#[test]
fn test_transpose() {
// A = [[1, 2, 3], [4, 5, 6]]
// A^T = [[1, 4], [2, 5], [3, 6]]
let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
let t = a.transpose();
assert_eq!(t.shape(), (3, 2));
assert!((t.get(0, 0) - 1.0).abs() < 1e-10);
assert!((t.get(0, 1) - 4.0).abs() < 1e-10);
assert!((t.get(1, 0) - 2.0).abs() < 1e-10);
assert!((t.get(1, 1) - 5.0).abs() < 1e-10);
assert!((t.get(2, 0) - 3.0).abs() < 1e-10);
assert!((t.get(2, 1) - 6.0).abs() < 1e-10);
}
#[test]
fn test_transpose_twice_is_identity() {
let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], 2, 3);
let t = a.transpose();
let back = t.transpose();
// (A^T)^T = A
assert_eq!(a.shape(), back.shape());
for row in 0..2 {
for col in 0..3 {
assert!(
(back.get(row, col) - a.get(row, col)).abs() < 1e-10,
"Mismatch at ({}, {}): expected {}, got {}",
row,
col,
a.get(row, col),
back.get(row, col)
);
}
}
}
#[test]
#[should_panic(expected = "Incompatible dimensions")]
fn test_multiply_incompatible_dimensions() {
let a = Matrix::new(vec![1.0, 2.0, 3.0, 4.0], 2, 2);
let b = Matrix::new(vec![1.0, 2.0, 3.0], 3, 1); // a.cols=2 != b.rows=3
a.multiply(&b);
}
}
Run the tests:
cargo test
You should see:
running 7 tests
test tests::test_create_and_get ... ok
test tests::test_multiply_2x3_by_3x2 ... ok
test tests::test_multiply_by_identity ... ok
test tests::test_multiply_incompatible_dimensions ... ok
test tests::test_multiply_square ... ok
test tests::test_transpose ... ok
test tests::test_transpose_twice_is_identity ... ok
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered
Walkthrough
Matrix { data, rows, cols }— We store the matrix as a flatVec<f64>with the shape stored alongside. Row-major means element $(i, j)$ lives at index $i \times \text{cols} + j$. This is the standard memory layout used by BLAS, numpy (row-major by default), and most ML frameworks. It also means a singleVecallocation holds all the data, which is cache-friendly.get(row, col)— Indexing into the flat array. We add bounds checks with descriptive panic messages, following the course style guide.multiply— The naive triple-nested loop. The outer two loops iterate over every entry of the result matrix, and the inner loop sums the products. This is a direct translation of $C_{ij} = \sum_k A_{ik} B_{kj}$.transpose— We iterate over columns first (outer loop), then rows (inner loop), pushing each element $A_{ij}$ into the new matrix's row-major order. After transposing, the matrix'srowsandcolsare swapped.identity— Allocate a zero-filled vector, then set the diagonal elements to 1. This is a simple and efficient construction.assert_eq!for dimensions — The constructor checks that the data length matchesrows * cols, andmultiplychecks the compatibility condition. Both give descriptive error messages.- Transpose twice returns the original —
test_transpose_twice_is_identityverifies the invariant $(\mathbf{A}^T)^T = \mathbf{A}$ by checking every element. This is a property-based test that catches bugs in bothtransposeand the indexing logic simultaneously.
Verification
The tests above verify:
| Test | What it checks | Mathematical invariant |
|---|---|---|
test_create_and_get | Element access and shape | Storage layout correctness |
test_multiply_2x3_by_3x2 | $C = AB$ with known result | Matrix multiplication formula |
test_multiply_square | $2\times2$ multiplication | $C_{ij} = \sum_k A_{ik}B_{kj}$ |
test_multiply_by_identity | $AI = A$ | Identity matrix property |
test_transpose | Known transpose result | $(A^T)_{ij} = A_{ji}$ |
test_transpose_twice_is_identity | $(A^T)^T = A$ | Involution property of transpose |
test_multiply_incompatible_dimensions | Proper panic on mismatch | Dimension compatibility |
Key Takeaways
- A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a rectangular array with $m$ rows and $n$ columns — the core data structure for linear transformations.
- Matrix multiplication $\mathbf{C} = \mathbf{A}\mathbf{B}$ requires $\mathbf{A}$'s columns to match $\mathbf{B}$'s rows. Each entry $C_{ij}$ is the dot product of row $i$ of $\mathbf{A}$ with column $j$ of $\mathbf{B}$.
- The transpose $\mathbf{A}^T$ swaps rows and columns; transposing twice recovers the original.
- The identity matrix $\mathbf{I}$ is the multiplicative identity — it leaves any compatible matrix unchanged under multiplication.
- Row-major flat storage (
data[row * cols + col]) is the standard memory layout for matrices — cache-friendly and used by most numerical libraries. - Naive matrix multiplication is $O(n^3)$ — understanding this baseline is essential before exploring optimized algorithms.