Linear Algebra: Vectors

Concept

A vector is an ordered list of numbers that represents a point or direction in space. Vectors are the fundamental data structure of machine learning — every input, every weight, every hidden state is a vector.

Why This Matters

Every ML model operates on vectors. An image is a vector of pixel values. A sentence is a vector of word embeddings. A layer's output is a vector. The operations you learn here — addition, scaling, dot products, norms — are the building blocks of every neural network, from a single neuron to a multi-modal transformer.

Mathematical Notation

A vector is written as a bold lowercase letter or with an arrow:

\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} \quad\text{or}\quad \vec{v}

The entries $v_i$ are called components. The number of components is the dimension. We write $v_i$ to refer to the $i$-th component (1-indexed in math, 0-indexed in code).

A vector lives in $\mathbb{R}^n$ — the set of $n$-dimensional real vectors (introduced in ch01).

Vector Addition

Add component-wise:

\mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{pmatrix}

Scalar Multiplication

Multiply every component by a scalar $\alpha$ (alpha — introduced in ch01):

\alpha \mathbf{v} = \begin{pmatrix} \alpha v_1 \\ \alpha v_2 \\ \vdots \\ \alpha v_n \end{pmatrix}

Dot Product (Inner Product)

The dot product takes two vectors and returns a single scalar. It is written with angle brackets or a dot:

\langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i

This is the inner product (notation introduced in ch01). It measures how much one vector points in the direction of another. Dot products are the core computation in neural network layers, attention mechanisms, and similarity search — every time a model compares two vectors, it's computing a dot product.

Norms

The Euclidean norm (L2 norm) is the length of a vector:

\|\mathbf{v}\|_2 = \sqrt{\sum_{i=1}^{n} v_i^2}

The L1 norm (Manhattan norm) sums absolute values:

\|\mathbf{v}\|_1 = \sum_{i=1}^{n} |v_i|

When the subscript is omitted, $\|\mathbf{v}\|$ defaults to the Euclidean norm.

Which norm to use? L2 is the default for measuring distance and is used in mean-squared error loss and weight decay (L2 regularization). L1 is more robust to outliers (it doesn't square large values) and, when used as a penalty during optimization, encourages sparse solutions — used in Lasso regression.

Unit Vectors

A unit vector has length 1. To normalize any vector, divide by its norm:

\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

The hat ( $\hat{\mathbf{v}}$ ) indicates a unit vector. Normalizing strips away magnitude — useful when you only care about direction (cosine similarity, stable training, orthogonalization methods like Gram-Schmidt).

Orthogonality

Two vectors are orthogonal (perpendicular) when their dot product is zero:

\mathbf{u} \perp \mathbf{v} \quad \iff \quad \langle \mathbf{u}, \mathbf{v} \rangle = 0

The symbol $\perp$ means "is orthogonal to." Orthogonal vectors are independent — they carry no redundant information, which is why orthogonal bases (e.g., principal components, Fourier basis) are so useful in data analysis.

Intuition

Geometric view: A vector is an arrow from the origin to a point. Adding vectors means placing them tip-to-tail. Scaling changes the arrow's length. The dot product tells you the angle: positive means they point in a similar direction, zero means perpendicular, negative means opposite.

Algebraic view: A vector is just a list of numbers. All operations happen component-by-component. This is the view that translates directly to code.

Higher dimensions: A 3-4-5 triangle shows that $\|[3,4]\| = 5$ in 2D. For a 3D vector like $(1,2,3)$, the length is $\sqrt{1^2 + 2^2 + 3^2} = \sqrt{14}$ — same formula, just one more term. For 100-dimensional vectors, the formula doesn't change either: keep adding squares, then take the square root. The geometry is impossible to visualize, but the algebra is identical — and your code doesn't care about the dimension.

Why dot products matter in ML: A neural network layer computes $f(\mathbf{W}\mathbf{x} + \mathbf{b})$. Each row of $\mathbf{W}$ is a weight vector; the dot product with the input $\mathbf{x}$ measures how well the input matches that row. This is the core computation of every dense layer, every attention head, every linear transformation.

Rust Implementation

Add a new crate to your workspace:

cd code && cargo new --lib --edition 2024 ch02-linear-algebra-vectors

Open code/ch02-linear-algebra-vectors/src/lib.rs and replace its contents with:

/// A vector of f64 values.
/// We use a simple newtype wrapper around Vec<f64>.
#[derive(Debug, Clone, PartialEq)]
pub struct Vector(Vec<f64>);

impl Vector {
    /// Create a new vector from a list of components.
    pub fn new(components: Vec<f64>) -> Self {
        Vector(components)
    }

    /// The dimension (number of components) of this vector.
    pub fn dim(&self) -> usize {
        self.0.len()
    }

    /// Access the i-th component (0-indexed).
    pub fn get(&self, i: usize) -> f64 {
        self.0[i]
    }

    /// Component-wise vector addition: self + other.
    /// Panics if dimensions differ.
    pub fn add(&self, other: &Vector) -> Vector {
        assert_eq!(self.dim(), other.dim(), "Vectors must have the same dimension");
        let result: Vec<f64> = self.0.iter().zip(&other.0).map(|(a, b)| a + b).collect();
        Vector(result)
    }

    /// Multiply by a scalar (scale).
    pub fn scale(&self, alpha: f64) -> Vector {
        Vector(self.0.iter().map(|&x| alpha * x).collect())
    }

    /// Dot product with another vector.
    /// Corresponds to ⟨u, v⟩ = ∑ u_i v_i.
    pub fn dot(&self, other: &Vector) -> f64 {
        assert_eq!(self.dim(), other.dim(), "Vectors must have the same dimension");
        self.0.iter().zip(&other.0).map(|(a, b)| a * b).sum()
    }

    /// Euclidean norm (L2): ‖v‖₂ = √(∑ v_i²).
    pub fn norm_l2(&self) -> f64 {
        self.dot(self).sqrt()
    }

    /// L1 norm: ‖v‖₁ = ∑ |v_i|.
    pub fn norm_l1(&self) -> f64 {
        self.0.iter().map(|&x| x.abs()).sum()
    }

    /// Return a unit vector (normalized).
    /// Panics if the vector has zero length.
    pub fn normalize(&self) -> Vector {
        let norm = self.norm_l2();
        assert!(norm > 0.0, "Cannot normalize a zero vector");
        self.scale(1.0 / norm)
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add() {
        let u = Vector::new(vec![1.0, 2.0, 3.0]);
        let v = Vector::new(vec![4.0, 5.0, 6.0]);
        let result = u.add(&v);
        assert_eq!(result, Vector::new(vec![5.0, 7.0, 9.0]));
    }

    #[test]
    fn test_scale() {
        let v = Vector::new(vec![1.0, 2.0, 3.0]);
        let result = v.scale(2.0);
        assert_eq!(result, Vector::new(vec![2.0, 4.0, 6.0]));
    }

    #[test]
    fn test_dot() {
        let u = Vector::new(vec![1.0, 2.0, 3.0]);
        let v = Vector::new(vec![4.0, 5.0, 6.0]);
        // 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
        assert_eq!(u.dot(&v), 32.0);
    }

    #[test]
    fn test_norm_l2() {
        // 3-4-5 triangle: ‖[3, 4]‖ = 5
        let v = Vector::new(vec![3.0, 4.0]);
        assert_eq!(v.norm_l2(), 5.0);
    }

    #[test]
    fn test_norm_l1() {
        let v = Vector::new(vec![-1.0, 2.0, -3.0]);
        // |−1| + |2| + |−3| = 1 + 2 + 3 = 6
        assert_eq!(v.norm_l1(), 6.0);
    }

    #[test]
    fn test_normalize() {
        let v = Vector::new(vec![3.0, 4.0]);
        let unit = v.normalize();
        // Unit vector of [3, 4] is [3/5, 4/5] = [0.6, 0.8]
        // Note: we compare with epsilon (1e-10) instead of == because
        // floating-point division introduces tiny rounding errors.
        assert!((unit.get(0) - 0.6).abs() < 1e-10);
        assert!((unit.get(1) - 0.8).abs() < 1e-10);
        // Its length should be 1
        assert!((unit.norm_l2() - 1.0).abs() < 1e-10);
    }

    #[test]
    fn test_orthogonal() {
        // [1, 0] and [0, 1] are orthogonal (dot = 0)
        let u = Vector::new(vec![1.0, 0.0]);
        let v = Vector::new(vec![0.0, 1.0]);
        assert_eq!(u.dot(&v), 0.0);
    }
}

Run the tests:

cargo test -p ch02-linear-algebra-vectors

running 7 tests
test tests::test_add ... ok
test tests::test_dot ... ok
test tests::test_norm_l1 ... ok
test tests::test_norm_l2 ... ok
test tests::test_normalize ... ok
test tests::test_orthogonal ... ok
test tests::test_scale ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered

Walkthrough

Vector(Vec<f64>) — We use a newtype pattern: a tuple struct wrapping Vec<f64>. This gives us a distinct type with its own methods, while delegating storage to Rust's standard dynamic array.
dim() — Returns the number of components, which Rust calls len() on the inner Vec.
zip and map — Vector operations use Rust's iterator combinators. self.0.iter().zip(&other.0) pairs up corresponding components, then map applies the operation. This is the code equivalent of "component-wise."
dot uses .sum() — After pairing and multiplying, we sum everything. This is $\sum_i u_i v_i$ expressed in one line.
norm_l2 calls dot(self, self) — The Euclidean norm is the square root of the dot product of a vector with itself. Reusing dot keeps the code DRY and reinforces the relationship between the two operations.
normalize calls scale — Normalization is just scaling by the reciprocal of the norm. Again, reuse.
Edge cases tested: The empty vector case — sum() on an empty iterator returns 0.0, which is correct (the empty dot product is 0, the empty norm is 0). But normalize on a zero vector will panic, which is the correct behavior (you cannot normalize zero).

Verification

The tests above verify:

Test	What it checks	Mathematical invariant
`test_add`	$[1,2,3] + [4,5,6] = [5,7,9]$	Component-wise addition
`test_scale`	$2 \cdot [1,2,3] = [2,4,6]$	Scalar multiplication
`test_dot`	$\langle [1,2,3], [4,5,6] \rangle = 32$	Dot product (sum of products)
`test_norm_l2`	$\|[3,4]\|_2 = 5$	Euclidean norm (Pythagorean triple)
`test_norm_l1`	$\|[-1,2,-3]\|_1 = 6$	L1 norm (sum of absolute values)
`test_normalize`	Unit vector has length 1	$\|\hat{\mathbf{v}}\| = 1$
`test_orthogonal`	$\langle [1,0], [0,1] \rangle = 0$	Orthogonality

Key Takeaways

A vector is an ordered list of numbers — the fundamental data structure in ML.
Vector addition and scalar multiplication are component-wise operations.
The dot product $\langle \mathbf{u}, \mathbf{v} \rangle$ measures alignment between vectors; it is the core operation in neural networks.
The Euclidean norm $\|\mathbf{v}\|_2$ is the length of a vector; dividing by it produces a unit vector.
Orthogonal vectors have a dot product of zero — they point in perpendicular directions.
Rust's iterator combinators (zip, map, sum) map naturally onto component-wise vector math.