Numerical Literacy

Concept

Read mathematical notation fluently — Greek letters, summation, products, sets, and common operators.

Why This Matters

Every ML paper is written in the language of mathematics. If the notation is a barrier, the paper is opaque. This chapter removes that barrier by teaching you to read notation the same way you read code: one symbol at a time.

Full reference: A complete Greek alphabet table and mathematical notation cheat sheet lives at greek-glossary.html. This chapter covers the subset you'll see in every paper.


The Greek Alphabet — Essential Five

These five letters appear in every single ML paper. Learn them first.

LetterNamePronouncedUsed for
$\alpha$alphaal-fahLearning rate
$\beta$betabay-tahMomentum, regularization coefficient
$\theta$thetathay-tahModel parameters / weights
$\sigma$, $\Sigma$sigmasig-mahStandard deviation ($\sigma$), summation ($\Sigma$)
$\lambda$lambdalam-dahRegularization strength, eigenvalues

Common Seven — You'll See Them Soon

LetterNamePronouncedUsed for
$\gamma$gammagam-ahLearning rate schedule, discount factor
$\delta$, $\Delta$deltadel-tahSmall change ($\delta$), large change ($\Delta$)
$\epsilon$epsilonep-sih-lonSmall constant (avoiding division by zero)
$\mu$mumyooMean of a distribution
$\nu$nunyooDegrees of freedom
$\rho$rhoroeCorrelation coefficient
$\phi$phifye / feeActivation function, feature map
$\omega$omegaoh-may-gahAngular frequency

Rare but Memorable

LetterNamePronouncedUsed for
$\zeta$zetazay-tahRiemann zeta function (rare in ML, notable in theory)
$\xi$xikseeRandom noise variable, latent variable
$\psi$psisigh / psighWavefunction, state representation

Note: $\Sigma$ (uppercase sigma, "sig-mah") and $\Delta$ (uppercase delta, "del-tah") are the only two Greek letters commonly used in both cases with different meanings. Lowercase $\sigma$ = standard deviation; uppercase $\Sigma$ = summation. Lowercase $\delta$ = small change; uppercase $\Delta$ = large change.


Summation Notation

The notation $\sum_{i=1}^{n} x_i$ means "sum all $x_i$ from $i=1$ to $i=n$":

$$ \sum_{i=1}^{n} x_i = x_1 + x_2 + \dots + x_n $$

Read it aloud as: "sum from i equals 1 to n of x-sub-i."

Example

If $x = [3, 7, 2, 9]$, then:

$$ \sum_{i=1}^{4} x_i = 3 + 7 + 2 + 9 = 21 $$

Double Summation

A matrix $A$ with entries $a_{ij}$ is summed over both rows and columns:

$$ \sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij} $$

This means: for each row $i$, sum across all columns $j$, then sum those row totals.


Product Notation

Similarly, $\prod_{i=1}^{n} x_i$ means "multiply all $x_i$ from $i=1$ to $i=n$":

$$ \prod_{i=1}^{n} x_i = x_1 \cdot x_2 \cdot \dots \cdot x_n $$

Read it aloud as: "product from i equals 1 to n of x-sub-i."

The symbol $\prod$ is the Greek capital letter pi (pronounced "pie").


Set Notation

A set is a collection of distinct elements. In ML, sets are used to describe what kind of values a variable can take.

NotationRead asMeaning
$x \in \mathbb{R}$"x is in R"$x$ is a real number
$x \notin \mathbb{R}$"x is not in R"$x$ is not a real number
$A \subseteq B$"A is a subset of B"Every element of $A$ is also in $B$
$\mathbb{R}^n$"R n"The set of $n$-dimensional real vectors
$\mathbb{R}^{m \times n}$"R m by n"The set of $m \times n$ real matrices
$\{x \mid x > 0\}$"the set of x such that x is greater than 0"Set-builder notation
$\emptyset$"empty set"The set with no elements
$\mathbb{N}$"N"Natural numbers $\{1, 2, 3, \dots\}$
$\mathbb{Z}$"Z"Integers $\{\dots, -2, -1, 0, 1, 2, \dots\}$

Why this matters for ML

When a paper says "let $W \in \mathbb{R}^{d \times k}$," it means: $W$ is a matrix with $d$ rows and $k$ columns, and every entry is a real number. This tells you the shape of the weights before you see a diagram.


Common Operators

SymbolNameMeaningFirst use in course
$\nabla$Nabla / delGradient — vector of partial derivativesOptimization
$\partial$Partial derivativeDerivative w.r.t. one variableCalculus
$\langle u, v \rangle$Inner productDot product of vectorsLinear algebra
$|v|$NormLength of a vector (default: Euclidean)Linear algebra
$|v|_p$p-normGeneralized length ($|v|_2$ = Euclidean)Linear algebra
$\otimes$Tensor productKronecker / outer productDeep learning
$\odot$Hadamard productElement-wise multiplicationNeural networks
$\propto$Proportional toEquals up to a constant factorProbability
$\sim$Distributed as$x \sim \mathcal{N}(0,1)$ means $x$ is normally distributedProbability

Rust Implementation

Let's translate summation and product into Rust. Open your terminal and create a new crate:

cargo new --name ch01 --lib ch01
cd ch01

Replace the contents of src/lib.rs with:

/// Sum of elements in a slice.
/// Corresponds to ∑_{i} x_i.
pub fn sum(x: &[f64]) -> f64 {
    let mut total = 0.0;
    for &val in x {
        total += val;
    }
    total
}

/// Product of elements in a slice.
/// Corresponds to ∏_{i} x_i.
pub fn product(x: &[f64]) -> f64 {
    let mut total = 1.0;
    for &val in x {
        total *= val;
    }
    total
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_sum() {
        assert_eq!(sum(&[1.0, 2.0, 3.0]), 6.0);
        assert_eq!(sum(&[]), 0.0);
    }

    #[test]
    fn test_product() {
        assert_eq!(product(&[2.0, 3.0, 4.0]), 24.0);
        assert_eq!(product(&[]), 1.0);
    }
}

Run the tests to verify:

cargo test

You should see:

running 2 tests
test tests::test_sum ... ok
test tests::test_product ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered

Walkthrough


Verification

The tests above verify that:

  1. sum of $[1, 2, 3]$ equals $6$ — which matches $1 + 2 + 3 = 6$.
  2. sum of an empty slice equals $0$ — the identity element for addition.
  3. product of $[2, 3, 4]$ equals $24$ — which matches $2 \cdot 3 \cdot 4 = 24$.
  4. product of an empty slice equals $1$ — the identity element for multiplication.

Crucial habit: Always test empty inputs. If your function panics on an empty slice, that's a bug — ML data can be empty, missing, or malformed.


Key Takeaways