Linear Transformations

Concept

A matrix is a linear transformation β€” it maps vectors to other vectors in a way that preserves straight lines and the origin. Every linear transformation can be represented as a matrix, and every matrix represents a linear transformation.

Why This Matters

Every neural network layer is a linear transformation (the weight matrix) followed by a nonlinearity. Attention scores are computed by transforming queries and keys. Convolutions are linear transformations on image patches. When you hear "representation learning," what a model learns is a sequence of linear transformations that reshape the data space into one where the task becomes easy. Understanding what a matrix does β€” not just how to multiply it β€” is the bridge from algebra to intuition.


Mathematical Notation

Linear Transformation Definition

A function $T: \mathbb{R}^n \to \mathbb{R}^m$ is a linear transformation if it satisfies two properties:

  1. Additivity: $T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})$
  2. Homogeneity: $T(\alpha \mathbf{v}) = \alpha T(\mathbf{v})$

Together these give linearity:

$$ T(\alpha \mathbf{u} + \beta \mathbf{v}) = \alpha T(\mathbf{u}) + \beta T(\mathbf{v}) $$

Matrix of a Transformation

Every linear transformation $T: \mathbb{R}^n \to \mathbb{R}^m$ is represented by an $m \times n$ matrix $\mathbf{A}$ where the columns are the images of the standard basis vectors:

$$ \mathbf{A} = \begin{pmatrix} | & | & & | \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \dots & T(\mathbf{e}_n) \\ | & | & & | \end{pmatrix} $$

where $\mathbf{e}_i$ is the $i$-th standard basis vector (1 in position $i$, 0 elsewhere).

Applying the transformation to a vector $\mathbf{x}$ is matrix-vector multiplication:

$$ T(\mathbf{x}) = \mathbf{A}\mathbf{x} = x_1 T(\mathbf{e}_1) + x_2 T(\mathbf{e}_2) + \dots + x_n T(\mathbf{e}_n) $$

This is a linear combination of the columns β€” the vector $\mathbf{x}$ provides the weights.

Composition

If $S: \mathbb{R}^n \to \mathbb{R}^m$ has matrix $\mathbf{A}$ and $T: \mathbb{R}^m \to \mathbb{R}^p$ has matrix $\mathbf{B}$, then the composition $T \circ S: \mathbb{R}^n \to \mathbb{R}^p$ has matrix $\mathbf{B}\mathbf{A}$:

$$ (T \circ S)(\mathbf{x}) = T(S(\mathbf{x})) = \mathbf{B}(\mathbf{A}\mathbf{x}) = (\mathbf{B}\mathbf{A})\mathbf{x} $$

Composition is matrix multiplication β€” this is why matrix multiplication is defined the way it is.


Intuition

Basis vectors as anchors

The columns of a matrix tell you exactly what the transformation does. If you know where the basis vectors $\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n$ go, you know where every vector goes. Everything else is just a linear combination.

Think of the basis vectors as the grid lines of the coordinate system. The transformation moves the grid β€” stretches it, rotates it, flips it β€” and every other point goes along for the ride.

Common transformations (2D)

Scaling β€” stretches or shrinks along each axis:

$$ \begin{pmatrix} s_x & 0 \\ 0 & s_y \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} s_x x \\ s_y y \end{pmatrix} $$

The basis vector $\mathbf{e}_1 = (1, 0)$ goes to $(s_x, 0)$, and $\mathbf{e}_2 = (0, 1)$ goes to $(0, s_y)$. The grid gets stretched.

Rotation β€” rotates counter-clockwise by angle $\theta$ (theta, introduced in ch01):

$$ \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x\cos\theta - y\sin\theta \\ x\sin\theta + y\cos\theta \end{pmatrix} $$

The first column $(\cos\theta, \sin\theta)$ is where $\mathbf{e}_1$ lands β€” it's the unit vector at angle $\theta$. The second column $(-\sin\theta, \cos\theta)$ is $\mathbf{e}_2$ rotated by $\theta$.

Reflection β€” flips across the $x$-axis:

$$ \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x \\ -y \end{pmatrix} $$

The $x$-axis stays put (first column is $\mathbf{e}_1$), but the $y$-axis flips (second column is $-\mathbf{e}_2$).

Shear β€” shifts each row proportional to the other coordinate:

$$ \begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x + a y \\ y \end{pmatrix} $$

The $y$-axis stays vertical, but the $x$-axis tilts. The parameter $a$ controls how much.

Visual summary

Each transformation is fully described by where it sends the basis vectors (these are the columns of the matrix):

e1 = (1,0)  ->  (2, 0)        (0.707, 0.707)  (1, 0)         (1, 0)
e2 = (0,1)  ->  (0, 1)        (-0.707, 0.707) (0, -1)        (1, 1)
               Scale(2,1)     Rotate(45)      Reflect x      Shear x(1)

The unit square with corners (0,0), (1,0), (1,1), (0,1) goes to the parallelogram spanned by the two image vectors.

Key insight: Reading down each column tells you the entire transformation. No need to visualize the grid β€” just look at where the basis vectors land. This is what transform(&[1.0, 0.0]) and transform(&[0.0, 1.0]) compute in code.


Rust Implementation

Create a new crate that builds on the Matrix type from ch03:

cargo new --name ch05 --lib ch05
cd ch05

Copy the Matrix struct from ch03 into src/lib.rs (or set up a path dependency in Cargo.toml):

# Cargo.toml β€” if using path dependency
[dependencies]
ch03 = { path = "../ch03" }

For simplicity, this chapter adds methods directly to the Matrix struct and includes a helper to convert a vector to a column matrix.

Add to src/lib.rs:

/// ── Linear transformations ──

impl Matrix {
    /// Apply this transformation to a vector (as a column matrix).
    ///
    /// If this matrix is mΓ—n, the input vector must have n components,
    /// and the result has m components.
    ///
    /// Mathematically: T(v) = A * v  where A is this matrix.
    pub fn transform(&self, v: &[f64]) -> Vec<f64> {
        assert_eq!(
            self.cols,
            v.len(),
            "Matrix cols {} doesn't match vector length {}",
            self.cols,
            v.len()
        );

        let mut result = vec![0.0; self.rows];
        for i in 0..self.rows {
            let mut sum = 0.0;
            for j in 0..self.cols {
                sum += self.get(i, j) * v[j];
            }
            result[i] = sum;
        }
        result
    }
}

/// ── 2D transformation factory functions ──

/// Create a 2D scaling matrix: Scale(sx, sy).
///
/// [ sx   0 ]
/// [ 0   sy ]
pub fn scale_2d(sx: f64, sy: f64) -> Matrix {
    Matrix::new(vec![sx, 0.0, 0.0, sy], 2, 2)
}

/// Create a 2D rotation matrix (counter-clockwise by `angle` radians).
///
/// [ cosΞΈ  -sinΞΈ ]
/// [ sinΞΈ   cosΞΈ ]
pub fn rotate_2d(angle: f64) -> Matrix {
    let c = angle.cos();
    let s = angle.sin();
    Matrix::new(vec![c, -s, s, c], 2, 2)
}

/// Create a 2D reflection matrix across the x-axis.
///
/// [ 1   0 ]
/// [ 0  -1 ]
pub fn reflect_x_2d() -> Matrix {
    Matrix::new(vec![1.0, 0.0, 0.0, -1.0], 2, 2)
}

/// Create a 2D reflection matrix across the y-axis.
///
/// [ -1   0 ]
/// [  0   1 ]
pub fn reflect_y_2d() -> Matrix {
    Matrix::new(vec![-1.0, 0.0, 0.0, 1.0], 2, 2)
}

/// Create a 2D shear matrix (horizontal shear).
///
/// [ 1   shx ]
/// [ 0    1  ]
pub fn shear_x_2d(shx: f64) -> Matrix {
    Matrix::new(vec![1.0, shx, 0.0, 1.0], 2, 2)
}

/// Create a 2D shear matrix (vertical shear).
///
/// [ 1    0  ]
/// [ shy  1  ]
pub fn shear_y_2d(shy: f64) -> Matrix {
    Matrix::new(vec![1.0, 0.0, shy, 1.0], 2, 2)
}

#[cfg(test)]
mod tests {
    use super::*;

    fn approx_eq(a: f64, b: f64) -> bool {
        (a - b).abs() < 1e-10
    }

    #[test]
    fn test_scale_basis_vectors() {
        let s = scale_2d(2.0, 3.0);
        // e1 = (1, 0) β†’ (2, 0)
        let r1 = s.transform(&[1.0, 0.0]);
        assert!(approx_eq(r1[0], 2.0) && approx_eq(r1[1], 0.0));
        // e2 = (0, 1) β†’ (0, 3)
        let r2 = s.transform(&[0.0, 1.0]);
        assert!(approx_eq(r2[0], 0.0) && approx_eq(r2[1], 3.0));
    }

    #[test]
    fn test_scale_point() {
        let s = scale_2d(2.0, 3.0);
        let r = s.transform(&[4.0, 5.0]);
        // (4, 5) β†’ (8, 15)
        assert!(approx_eq(r[0], 8.0) && approx_eq(r[1], 15.0));
    }

    #[test]
    fn test_rotation_90_degrees() {
        // Rotating (1, 0) by 90Β° CCW should give (0, 1)
        let r = rotate_2d(std::f64::consts::FRAC_PI_2);
        let v = r.transform(&[1.0, 0.0]);
        assert!(approx_eq(v[0], 0.0) && approx_eq(v[1], 1.0));
    }

    #[test]
    fn test_rotation_180_degrees() {
        // Rotating (1, 0) by 180Β° should give (-1, 0)
        let r = rotate_2d(std::f64::consts::PI);
        let v = r.transform(&[1.0, 0.0]);
        assert!(approx_eq(v[0], -1.0) && approx_eq(v[1], 0.0));
    }

    #[test]
    fn test_reflect_x() {
        let r = reflect_x_2d();
        let v = r.transform(&[3.0, 4.0]);
        // (3, 4) β†’ (3, -4)
        assert!(approx_eq(v[0], 3.0) && approx_eq(v[1], -4.0));
    }

    #[test]
    fn test_reflect_y() {
        let r = reflect_y_2d();
        let v = r.transform(&[3.0, 4.0]);
        // (3, 4) β†’ (-3, 4)
        assert!(approx_eq(v[0], -3.0) && approx_eq(v[1], 4.0));
    }

    #[test]
    fn test_composition_rotate_then_reflect() {
        let rotate = rotate_2d(std::f64::consts::FRAC_PI_2);  // 90Β° CCW
        let reflect = reflect_x_2d();

        // Compose: reflect ∘ rotate  (rotate first, then reflect)
        // Composition matrix = reflect * rotate
        let composed = reflect.multiply(&rotate);

        // Apply to (1, 0):
        //   rotate(1, 0) = (0, 1)
        //   reflect(0, 1) = (0, -1)
        let v = composed.transform(&[1.0, 0.0]);
        assert!(approx_eq(v[0], 0.0) && approx_eq(v[1], -1.0));

        // Apply sequentially to verify
        let step1 = rotate.transform(&[1.0, 0.0]);
        let step2 = reflect.transform(&step1);
        assert!(approx_eq(step2[0], 0.0) && approx_eq(step2[1], -1.0));
    }

    #[test]
    fn test_composition_scale_then_rotate() {
        // Scale by (2, 1) then rotate 45Β°
        let s = scale_2d(2.0, 1.0);
        let r = rotate_2d(std::f64::consts::FRAC_PI_4);

        let composed = r.multiply(&s);

        // Apply to (1, 0):
        //   scale(1, 0) = (2, 0)
        //   rotate(2, 0) = (2cos45Β°, 2sin45Β°) = (√2, √2) β‰ˆ (1.4142, 1.4142)
        let v = composed.transform(&[1.0, 0.0]);
        let sqrt2 = std::f64::consts::FRAC_1_SQRT_2 * 2.0; // = 2/√2 = √2
        assert!(approx_eq(v[0], sqrt2) && approx_eq(v[1], sqrt2));
    }

    #[test]
    fn test_shear_x() {
        let sh = shear_x_2d(2.0);
        let v = sh.transform(&[3.0, 4.0]);
        // (3, 4) β†’ (3 + 2*4, 4) = (11, 4)
        assert!(approx_eq(v[0], 11.0) && approx_eq(v[1], 4.0));
    }

    #[test]
    fn test_identity_transformation() {
        let i = Matrix::identity(3);
        let v = i.transform(&[5.0, 6.0, 7.0]);
        assert!(approx_eq(v[0], 5.0) && approx_eq(v[1], 6.0) && approx_eq(v[2], 7.0));
    }

    #[test]
    fn test_double_reflection_is_identity() {
        // Reflect across x-axis twice β†’ back to original
        let r = reflect_x_2d();
        let double = r.multiply(&r); // (reflect ∘ reflect) = identity
        let v = double.transform(&[123.0, -456.0]);
        assert!(approx_eq(v[0], 123.0) && approx_eq(v[1], -456.0));
    }

    #[test]
    fn test_rotation_then_inverse() {
        // Rotating by ΞΈ then by -ΞΈ gives identity
        let theta = 0.7;
        let r = rotate_2d(theta);
        let r_inv = rotate_2d(-theta);
        // Compose: r_inv ∘ r = identity
        // (since rotation matrices satisfy R(-ΞΈ) = R(ΞΈ)^T = R(ΞΈ)^{-1})
        let composed = r_inv.multiply(&r);
        let v = composed.transform(&[3.0, 4.0]);
        assert!(approx_eq(v[0], 3.0) && approx_eq(v[1], 4.0));
    }
}

Run the tests:

cargo test

You should see:

running 12 tests
test tests::test_composition_rotate_then_reflect ... ok
test tests::test_composition_scale_then_rotate ... ok
test tests::test_double_reflection_is_identity ... ok
test tests::test_identity_transformation ... ok
test tests::test_reflect_x ... ok
test tests::test_reflect_y ... ok
test tests::test_rotation_180_degrees ... ok
test tests::test_rotation_90_degrees ... ok
test tests::test_rotation_then_inverse ... ok
test tests::test_scale_basis_vectors ... ok
test tests::test_scale_point ... ok
test tests::test_shear_x ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered

Walkthrough


Verification

TestWhat it checksInvariant
test_scale_basis_vectorsColumns of scale matrix = scaled basis vectors$A\mathbf{e}_i = i\text{th column}$
test_scale_pointScale transforms a point correctly$(x, y) \to (s_x x, s_y y)$
test_rotation_90_degrees90Β° rotation of (1, 0) β†’ (0, 1)$\mathbf{e}_1 \to \mathbf{e}_2$
test_rotation_180_degrees180Β° rotation of (1, 0) β†’ (-1, 0)$\mathbf{e}_1 \to -\mathbf{e}_1$
test_reflect_xReflection across x-axis flips y$(x, y) \to (x, -y)$
test_composition_rotate_then_reflectComposed matrix = sequential application$\mathbf{BAx} = \mathbf{B}(\mathbf{Ax})$
test_double_reflection_is_identityReflect twice = do nothing$R_x \circ R_x = I$
test_rotation_then_inverseRotate by ΞΈ then -ΞΈ = identity$R_{-\theta} R_{\theta} = I$

Key Takeaways