Linear Transformations
Concept
A matrix is a linear transformation β it maps vectors to other vectors in a way that preserves straight lines and the origin. Every linear transformation can be represented as a matrix, and every matrix represents a linear transformation.
Why This Matters
Every neural network layer is a linear transformation (the weight matrix) followed by a nonlinearity. Attention scores are computed by transforming queries and keys. Convolutions are linear transformations on image patches. When you hear "representation learning," what a model learns is a sequence of linear transformations that reshape the data space into one where the task becomes easy. Understanding what a matrix does β not just how to multiply it β is the bridge from algebra to intuition.
Mathematical Notation
Linear Transformation Definition
A function $T: \mathbb{R}^n \to \mathbb{R}^m$ is a linear transformation if it satisfies two properties:
- Additivity: $T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})$
- Homogeneity: $T(\alpha \mathbf{v}) = \alpha T(\mathbf{v})$
Together these give linearity:
Matrix of a Transformation
Every linear transformation $T: \mathbb{R}^n \to \mathbb{R}^m$ is represented by an $m \times n$ matrix $\mathbf{A}$ where the columns are the images of the standard basis vectors:
where $\mathbf{e}_i$ is the $i$-th standard basis vector (1 in position $i$, 0 elsewhere).
Applying the transformation to a vector $\mathbf{x}$ is matrix-vector multiplication:
This is a linear combination of the columns β the vector $\mathbf{x}$ provides the weights.
Composition
If $S: \mathbb{R}^n \to \mathbb{R}^m$ has matrix $\mathbf{A}$ and $T: \mathbb{R}^m \to \mathbb{R}^p$ has matrix $\mathbf{B}$, then the composition $T \circ S: \mathbb{R}^n \to \mathbb{R}^p$ has matrix $\mathbf{B}\mathbf{A}$:
Composition is matrix multiplication β this is why matrix multiplication is defined the way it is.
Intuition
Basis vectors as anchors
The columns of a matrix tell you exactly what the transformation does. If you know where the basis vectors $\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n$ go, you know where every vector goes. Everything else is just a linear combination.
Think of the basis vectors as the grid lines of the coordinate system. The transformation moves the grid β stretches it, rotates it, flips it β and every other point goes along for the ride.
Common transformations (2D)
Scaling β stretches or shrinks along each axis:
The basis vector $\mathbf{e}_1 = (1, 0)$ goes to $(s_x, 0)$, and $\mathbf{e}_2 = (0, 1)$ goes to $(0, s_y)$. The grid gets stretched.
Rotation β rotates counter-clockwise by angle $\theta$ (theta, introduced in ch01):
The first column $(\cos\theta, \sin\theta)$ is where $\mathbf{e}_1$ lands β it's the unit vector at angle $\theta$. The second column $(-\sin\theta, \cos\theta)$ is $\mathbf{e}_2$ rotated by $\theta$.
Reflection β flips across the $x$-axis:
The $x$-axis stays put (first column is $\mathbf{e}_1$), but the $y$-axis flips (second column is $-\mathbf{e}_2$).
Shear β shifts each row proportional to the other coordinate:
The $y$-axis stays vertical, but the $x$-axis tilts. The parameter $a$ controls how much.
Visual summary
Each transformation is fully described by where it sends the basis vectors (these are the columns of the matrix):
e1 = (1,0) -> (2, 0) (0.707, 0.707) (1, 0) (1, 0)
e2 = (0,1) -> (0, 1) (-0.707, 0.707) (0, -1) (1, 1)
Scale(2,1) Rotate(45) Reflect x Shear x(1)
The unit square with corners (0,0), (1,0), (1,1), (0,1) goes to the parallelogram spanned by the two image vectors.
Key insight: Reading down each column tells you the entire transformation. No need to visualize the grid β just look at where the basis vectors land. This is what transform(&[1.0, 0.0]) and transform(&[0.0, 1.0]) compute in code.
Rust Implementation
Create a new crate that builds on the Matrix type from ch03:
cargo new --name ch05 --lib ch05
cd ch05
Copy the Matrix struct from ch03 into src/lib.rs (or set up a path dependency in Cargo.toml):
# Cargo.toml β if using path dependency
[dependencies]
ch03 = { path = "../ch03" }
For simplicity, this chapter adds methods directly to the Matrix struct and includes a helper to convert a vector to a column matrix.
Add to src/lib.rs:
/// ββ Linear transformations ββ
impl Matrix {
/// Apply this transformation to a vector (as a column matrix).
///
/// If this matrix is mΓn, the input vector must have n components,
/// and the result has m components.
///
/// Mathematically: T(v) = A * v where A is this matrix.
pub fn transform(&self, v: &[f64]) -> Vec<f64> {
assert_eq!(
self.cols,
v.len(),
"Matrix cols {} doesn't match vector length {}",
self.cols,
v.len()
);
let mut result = vec![0.0; self.rows];
for i in 0..self.rows {
let mut sum = 0.0;
for j in 0..self.cols {
sum += self.get(i, j) * v[j];
}
result[i] = sum;
}
result
}
}
/// ββ 2D transformation factory functions ββ
/// Create a 2D scaling matrix: Scale(sx, sy).
///
/// [ sx 0 ]
/// [ 0 sy ]
pub fn scale_2d(sx: f64, sy: f64) -> Matrix {
Matrix::new(vec![sx, 0.0, 0.0, sy], 2, 2)
}
/// Create a 2D rotation matrix (counter-clockwise by `angle` radians).
///
/// [ cosΞΈ -sinΞΈ ]
/// [ sinΞΈ cosΞΈ ]
pub fn rotate_2d(angle: f64) -> Matrix {
let c = angle.cos();
let s = angle.sin();
Matrix::new(vec![c, -s, s, c], 2, 2)
}
/// Create a 2D reflection matrix across the x-axis.
///
/// [ 1 0 ]
/// [ 0 -1 ]
pub fn reflect_x_2d() -> Matrix {
Matrix::new(vec![1.0, 0.0, 0.0, -1.0], 2, 2)
}
/// Create a 2D reflection matrix across the y-axis.
///
/// [ -1 0 ]
/// [ 0 1 ]
pub fn reflect_y_2d() -> Matrix {
Matrix::new(vec![-1.0, 0.0, 0.0, 1.0], 2, 2)
}
/// Create a 2D shear matrix (horizontal shear).
///
/// [ 1 shx ]
/// [ 0 1 ]
pub fn shear_x_2d(shx: f64) -> Matrix {
Matrix::new(vec![1.0, shx, 0.0, 1.0], 2, 2)
}
/// Create a 2D shear matrix (vertical shear).
///
/// [ 1 0 ]
/// [ shy 1 ]
pub fn shear_y_2d(shy: f64) -> Matrix {
Matrix::new(vec![1.0, 0.0, shy, 1.0], 2, 2)
}
#[cfg(test)]
mod tests {
use super::*;
fn approx_eq(a: f64, b: f64) -> bool {
(a - b).abs() < 1e-10
}
#[test]
fn test_scale_basis_vectors() {
let s = scale_2d(2.0, 3.0);
// e1 = (1, 0) β (2, 0)
let r1 = s.transform(&[1.0, 0.0]);
assert!(approx_eq(r1[0], 2.0) && approx_eq(r1[1], 0.0));
// e2 = (0, 1) β (0, 3)
let r2 = s.transform(&[0.0, 1.0]);
assert!(approx_eq(r2[0], 0.0) && approx_eq(r2[1], 3.0));
}
#[test]
fn test_scale_point() {
let s = scale_2d(2.0, 3.0);
let r = s.transform(&[4.0, 5.0]);
// (4, 5) β (8, 15)
assert!(approx_eq(r[0], 8.0) && approx_eq(r[1], 15.0));
}
#[test]
fn test_rotation_90_degrees() {
// Rotating (1, 0) by 90Β° CCW should give (0, 1)
let r = rotate_2d(std::f64::consts::FRAC_PI_2);
let v = r.transform(&[1.0, 0.0]);
assert!(approx_eq(v[0], 0.0) && approx_eq(v[1], 1.0));
}
#[test]
fn test_rotation_180_degrees() {
// Rotating (1, 0) by 180Β° should give (-1, 0)
let r = rotate_2d(std::f64::consts::PI);
let v = r.transform(&[1.0, 0.0]);
assert!(approx_eq(v[0], -1.0) && approx_eq(v[1], 0.0));
}
#[test]
fn test_reflect_x() {
let r = reflect_x_2d();
let v = r.transform(&[3.0, 4.0]);
// (3, 4) β (3, -4)
assert!(approx_eq(v[0], 3.0) && approx_eq(v[1], -4.0));
}
#[test]
fn test_reflect_y() {
let r = reflect_y_2d();
let v = r.transform(&[3.0, 4.0]);
// (3, 4) β (-3, 4)
assert!(approx_eq(v[0], -3.0) && approx_eq(v[1], 4.0));
}
#[test]
fn test_composition_rotate_then_reflect() {
let rotate = rotate_2d(std::f64::consts::FRAC_PI_2); // 90Β° CCW
let reflect = reflect_x_2d();
// Compose: reflect β rotate (rotate first, then reflect)
// Composition matrix = reflect * rotate
let composed = reflect.multiply(&rotate);
// Apply to (1, 0):
// rotate(1, 0) = (0, 1)
// reflect(0, 1) = (0, -1)
let v = composed.transform(&[1.0, 0.0]);
assert!(approx_eq(v[0], 0.0) && approx_eq(v[1], -1.0));
// Apply sequentially to verify
let step1 = rotate.transform(&[1.0, 0.0]);
let step2 = reflect.transform(&step1);
assert!(approx_eq(step2[0], 0.0) && approx_eq(step2[1], -1.0));
}
#[test]
fn test_composition_scale_then_rotate() {
// Scale by (2, 1) then rotate 45Β°
let s = scale_2d(2.0, 1.0);
let r = rotate_2d(std::f64::consts::FRAC_PI_4);
let composed = r.multiply(&s);
// Apply to (1, 0):
// scale(1, 0) = (2, 0)
// rotate(2, 0) = (2cos45Β°, 2sin45Β°) = (β2, β2) β (1.4142, 1.4142)
let v = composed.transform(&[1.0, 0.0]);
let sqrt2 = std::f64::consts::FRAC_1_SQRT_2 * 2.0; // = 2/β2 = β2
assert!(approx_eq(v[0], sqrt2) && approx_eq(v[1], sqrt2));
}
#[test]
fn test_shear_x() {
let sh = shear_x_2d(2.0);
let v = sh.transform(&[3.0, 4.0]);
// (3, 4) β (3 + 2*4, 4) = (11, 4)
assert!(approx_eq(v[0], 11.0) && approx_eq(v[1], 4.0));
}
#[test]
fn test_identity_transformation() {
let i = Matrix::identity(3);
let v = i.transform(&[5.0, 6.0, 7.0]);
assert!(approx_eq(v[0], 5.0) && approx_eq(v[1], 6.0) && approx_eq(v[2], 7.0));
}
#[test]
fn test_double_reflection_is_identity() {
// Reflect across x-axis twice β back to original
let r = reflect_x_2d();
let double = r.multiply(&r); // (reflect β reflect) = identity
let v = double.transform(&[123.0, -456.0]);
assert!(approx_eq(v[0], 123.0) && approx_eq(v[1], -456.0));
}
#[test]
fn test_rotation_then_inverse() {
// Rotating by ΞΈ then by -ΞΈ gives identity
let theta = 0.7;
let r = rotate_2d(theta);
let r_inv = rotate_2d(-theta);
// Compose: r_inv β r = identity
// (since rotation matrices satisfy R(-ΞΈ) = R(ΞΈ)^T = R(ΞΈ)^{-1})
let composed = r_inv.multiply(&r);
let v = composed.transform(&[3.0, 4.0]);
assert!(approx_eq(v[0], 3.0) && approx_eq(v[1], 4.0));
}
}
Run the tests:
cargo test
You should see:
running 12 tests
test tests::test_composition_rotate_then_reflect ... ok
test tests::test_composition_scale_then_rotate ... ok
test tests::test_double_reflection_is_identity ... ok
test tests::test_identity_transformation ... ok
test tests::test_reflect_x ... ok
test tests::test_reflect_y ... ok
test tests::test_rotation_180_degrees ... ok
test tests::test_rotation_90_degrees ... ok
test tests::test_rotation_then_inverse ... ok
test tests::test_scale_basis_vectors ... ok
test tests::test_scale_point ... ok
test tests::test_shear_x ... ok
test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered
Walkthrough
-
transform(&self, v: &[f64])β Matrix-vector multiplication. This is the core operation of a linear transformation: for each row of the matrix, compute the dot product with the input vector. The result is a new vector. -
Factory functions β
scale_2d,rotate_2d,reflect_x_2d,reflect_y_2d,shear_x_2d,shear_y_2dare free functions (not methods) that construct 2Γ2 matrices. They're defined outsideimpl Matrixbecause they don't operate on an existing matrix. -
Composition is multiplication β Applying transformation $\mathbf{B}$ after $\mathbf{A}$ means computing $\mathbf{B}(\mathbf{A}\mathbf{x})$, which is $(\mathbf{B}\mathbf{A})\mathbf{x}$. The matrix for the composed transformation is $\mathbf{B}\mathbf{A}$ β matrix multiplication in action. This is tested in
test_composition_rotate_then_reflectandtest_composition_scale_then_rotate. -
Column interpretation β The first column of a transformation matrix is where $\mathbf{e}_1 = (1, 0)$ goes. For
scale_2d(2, 3), the first column is $(2, 0)$ β $\mathbf{e}_1$ is scaled by 2 along x. The second column is $(0, 3)$ β $\mathbf{e}_2$ is scaled by 3 along y. This is verified intest_scale_basis_vectors. -
Inverse transformations β Some transformations can be undone. Rotating by $-\theta$ undoes a rotation by $\theta$. Reflecting twice gives back the original. These are tested in
test_double_reflection_is_identityandtest_rotation_then_inverse.
Verification
| Test | What it checks | Invariant |
|---|---|---|
test_scale_basis_vectors | Columns of scale matrix = scaled basis vectors | $A\mathbf{e}_i = i\text{th column}$ |
test_scale_point | Scale transforms a point correctly | $(x, y) \to (s_x x, s_y y)$ |
test_rotation_90_degrees | 90Β° rotation of (1, 0) β (0, 1) | $\mathbf{e}_1 \to \mathbf{e}_2$ |
test_rotation_180_degrees | 180Β° rotation of (1, 0) β (-1, 0) | $\mathbf{e}_1 \to -\mathbf{e}_1$ |
test_reflect_x | Reflection across x-axis flips y | $(x, y) \to (x, -y)$ |
test_composition_rotate_then_reflect | Composed matrix = sequential application | $\mathbf{BAx} = \mathbf{B}(\mathbf{Ax})$ |
test_double_reflection_is_identity | Reflect twice = do nothing | $R_x \circ R_x = I$ |
test_rotation_then_inverse | Rotate by ΞΈ then -ΞΈ = identity | $R_{-\theta} R_{\theta} = I$ |
Key Takeaways
- A linear transformation is any function $T$ satisfying $T(\alpha\mathbf{u} + \beta\mathbf{v}) = \alpha T(\mathbf{u}) + \beta T(\mathbf{v})$ β it preserves lines and the origin.
- Every linear transformation is represented by a matrix whose columns are where the basis vectors land.
- Applying a transformation is matrix-vector multiplication β a linear combination of the columns.
- Composing transformations is matrix multiplication β the order matters (rightmost is applied first).
- Common 2D transformations (scale, rotate, reflect, shear) are simple 2Γ2 matrices with clear geometric meaning.
- The identity matrix does nothing; an inverse transformation undoes a transformation.