Numerical Literacy

Concept

Read mathematical notation fluently — Greek letters, summation, products, sets, and common operators.

Why This Matters

Every ML paper is written in the language of mathematics. If the notation is a barrier, the paper is opaque. This chapter removes that barrier by teaching you to read notation the same way you read code: one symbol at a time.

Full reference: A complete Greek alphabet table and mathematical notation cheat sheet lives at greek-glossary.html. This chapter covers the subset you'll see in every paper.

The Greek Alphabet — Essential Five

These five letters appear in every single ML paper. Learn them first.

Letter	Name	Pronounced	Used for
$\alpha$	alpha	al-fah	Learning rate
$\beta$	beta	bay-tah	Momentum, regularization coefficient
$\theta$	theta	thay-tah	Model parameters / weights
$\sigma$, $\Sigma$	sigma	sig-mah	Standard deviation ($\sigma$), summation ($\Sigma$)
$\lambda$	lambda	lam-dah	Regularization strength, eigenvalues

Common Seven — You'll See Them Soon

Letter	Name	Pronounced	Used for
$\gamma$	gamma	gam-ah	Learning rate schedule, discount factor
$\delta$, $\Delta$	delta	del-tah	Small change ($\delta$), large change ($\Delta$)
$\epsilon$	epsilon	ep-sih-lon	Small constant (avoiding division by zero)
$\mu$	mu	myoo	Mean of a distribution
$\nu$	nu	nyoo	Degrees of freedom
$\rho$	rho	roe	Correlation coefficient
$\phi$	phi	fye / fee	Activation function, feature map
$\omega$	omega	oh-may-gah	Angular frequency

Rare but Memorable

Letter	Name	Pronounced	Used for
$\zeta$	zeta	zay-tah	Riemann zeta function (rare in ML, notable in theory)
$\xi$	xi	ksee	Random noise variable, latent variable
$\psi$	psi	sigh / psigh	Wavefunction, state representation

Note: $\Sigma$ (uppercase sigma, "sig-mah") and $\Delta$ (uppercase delta, "del-tah") are the only two Greek letters commonly used in both cases with different meanings. Lowercase $\sigma$ = standard deviation; uppercase $\Sigma$ = summation. Lowercase $\delta$ = small change; uppercase $\Delta$ = large change.

Summation Notation

The notation $\sum_{i=1}^{n} x_i$ means "sum all $x_i$ from $i=1$ to $i=n$":

\sum_{i=1}^{n} x_i = x_1 + x_2 + \dots + x_n

Read it aloud as: "sum from i equals 1 to n of x-sub-i."

Example

If $x = [3, 7, 2, 9]$, then:

\sum_{i=1}^{4} x_i = 3 + 7 + 2 + 9 = 21

Double Summation

A matrix $A$ with entries $a_{ij}$ is summed over both rows and columns:

\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}

This means: for each row $i$, sum across all columns $j$, then sum those row totals.

Product Notation

Similarly, $\prod_{i=1}^{n} x_i$ means "multiply all $x_i$ from $i=1$ to $i=n$":

\prod_{i=1}^{n} x_i = x_1 \cdot x_2 \cdot \dots \cdot x_n

Read it aloud as: "product from i equals 1 to n of x-sub-i."

The symbol $\prod$ is the Greek capital letter pi (pronounced "pie").

Set Notation

A set is a collection of distinct elements. In ML, sets are used to describe what kind of values a variable can take.

Notation	Read as	Meaning
$x \in \mathbb{R}$	"x is in R"	$x$ is a real number
$x \notin \mathbb{R}$	"x is not in R"	$x$ is not a real number
$A \subseteq B$	"A is a subset of B"	Every element of $A$ is also in $B$
$\mathbb{R}^n$	"R n"	The set of $n$-dimensional real vectors
$\mathbb{R}^{m \times n}$	"R m by n"	The set of $m \times n$ real matrices
$\{x \mid x > 0\}$	"the set of x such that x is greater than 0"	Set-builder notation
$\emptyset$	"empty set"	The set with no elements
$\mathbb{N}$	"N"	Natural numbers $\{1, 2, 3, \dots\}$
$\mathbb{Z}$	"Z"	Integers $\{\dots, -2, -1, 0, 1, 2, \dots\}$

Why this matters for ML

When a paper says "let $W \in \mathbb{R}^{d \times k}$," it means: $W$ is a matrix with $d$ rows and $k$ columns, and every entry is a real number. This tells you the shape of the weights before you see a diagram.

Common Operators

Symbol	Name	Meaning	First use in course
$\nabla$	Nabla / del	Gradient — vector of partial derivatives	Optimization
$\partial$	Partial derivative	Derivative w.r.t. one variable	Calculus
$\langle u, v \rangle$	Inner product	Dot product of vectors	Linear algebra
$\|v\|$	Norm	Length of a vector (default: Euclidean)	Linear algebra
$\|v\|_p$	p-norm	Generalized length ($\|v\|_2$ = Euclidean)	Linear algebra
$\otimes$	Tensor product	Kronecker / outer product	Deep learning
$\odot$	Hadamard product	Element-wise multiplication	Neural networks
$\propto$	Proportional to	Equals up to a constant factor	Probability
$\sim$	Distributed as	$x \sim \mathcal{N}(0,1)$ means $x$ is normally distributed	Probability

Rust Implementation

Let's translate summation and product into Rust. Start by setting up a workspace that will hold all the crates you build through the course:

cd /path/to/sota-ai
mkdir -p code
cd code

Create code/Cargo.toml with:

[workspace]
resolver = "3"

Now create the first chapter's crate (this also adds it to the workspace automatically):

cargo new --lib --edition 2024 ch01-numerical-literacy

Open code/ch01-numerical-literacy/src/lib.rs and replace its contents with:

/// Sum of elements in a slice.
/// Corresponds to ∑_{i} x_i.
pub fn sum(x: &[f64]) -> f64 {
    let mut total = 0.0;
    for &val in x {
        total += val;
    }
    total
}

/// Product of elements in a slice.
/// Corresponds to ∏_{i} x_i.
pub fn product(x: &[f64]) -> f64 {
    let mut total = 1.0;
    for &val in x {
        total *= val;
    }
    total
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_sum() {
        assert_eq!(sum(&[1.0, 2.0, 3.0]), 6.0);
        assert_eq!(sum(&[]), 0.0);
    }

    #[test]
    fn test_product() {
        assert_eq!(product(&[2.0, 3.0, 4.0]), 24.0);
        assert_eq!(product(&[]), 1.0);
    }
}

Run the tests to verify:

cargo test -p ch01-numerical-literacy

Or run all workspace tests:

cd code && cargo test

You should see:

running 2 tests
test tests::test_sum ... ok
test tests::test_product ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered

From now on, each chapter adds a new crate to this workspace. You'll build up the entire course's Rust code in one place.

Walkthrough

&[f64] — a slice of 64-bit floats. The & means we borrow the data without taking ownership.
for &val in x — iterate over each element, dereferencing automatically with &val.
0.0 and 1.0 — the identity elements for addition and multiplication: adding 0 or multiplying by 1 leaves the value unchanged. Notice that sum(&[]) returns 0.0 and product(&[]) returns 1.0 — this is mathematically correct (the empty sum is 0, the empty product is 1).
#[cfg(test)] — only compiles this module when running cargo test.

Verification

The tests above verify that:

sum of $[1, 2, 3]$ equals $6$ — which matches $1 + 2 + 3 = 6$.
sum of an empty slice equals $0$ — the identity element for addition.
product of $[2, 3, 4]$ equals $24$ — which matches $2 \cdot 3 \cdot 4 = 24$.
product of an empty slice equals $1$ — the identity element for multiplication.

Crucial habit: Always test empty inputs. If your function panics on an empty slice, that's a bug — ML data can be empty, missing, or malformed.

Key Takeaways

Greek letters are just names — learn the 5 essential ones ($\alpha, \beta, \theta, \sigma, \lambda$) and you can read 80% of notation.
Pronounce them aloud — saying "theta" or "epsilon" locks it in your memory.
$\sum$ means add, $\prod$ means multiply — everything else is just details about the range.
Set notation tells you the shape — $\mathbb{R}^{d \times k}$ immediately tells you "d by k matrix of real numbers."
Every operator has a name — when you forget, the glossary is at greek-glossary.html.
Implement the math in Rust — writing code that matches the notation is the fastest way to internalize it.