Course
Python
From your first line of Python to a transformer you build by hand. Foundations, data structures, algorithms, real systems, and modern deep learning — all written so each idea earns its place.
100 lessons across 20 sections.
Philosophy
Foundations
- What Is Programming?Breaking a task into steps. Thinking in things and actions before we call them objects and methods.
- How Computers WorkCPU, memory, storage, GPU, buses, and the electrons that carry every 1 and 0.
- Layers of AbstractionSilicon to OS to interpreter to your program. Where each layer hides the one below.
- Setting up PythonVS Code, a terminal, a venv, strict linters, and git from day one.
Primitives & Control
- Variables and MemoryWhat a name actually points to. Stack, heap, references, and why two variables can share one object.
- Primitive Typesint, float, bool, str, None. What each one costs in bytes and where it breaks.
- Control Flowif, else, while, for, match. The branches and loops of every program.
- FunctionsParameters, returns, scope, default args, *args and **kwargs — and the mutable-default trap.
- Iterators and GeneratorsLazy sequences, yield, the iterator protocol. The same idea pandas and PyTorch lean on later.
- Errors and Exceptionstry, except, raise, the traceback. How to fail loudly in the right place.
- File I/OReading, writing, paths, encodings, context managers. Regex lives here as a short appendix.
Data Structures
- ArraysBuild a fixed-size array from scratch. Then meet Python's list. See what O(1) index actually costs.
- Linked ListsNodes, next pointers, head and tail. Build one in plain classes before reaching for deque.
- Stacks and QueuesPush, pop, enqueue, dequeue. Hand-rolled first, then Python's list and collections.deque.
- Hash TablesHash, bucket, collision, resize. Write your own before you trust a dict.
- TreesBinary trees, BSTs, traversals. Build it from Node classes before touching any library.
- GraphsAdjacency lists, adjacency matrices. Hand-built so BFS and Dijkstra land next week.
Algorithms
- Big OHow growth is measured. O(1), O(log n), O(n), O(n log n), O(n^2), O(2^n) with charts.
- SortingBubble, insertion, merge, quick. Each one traced in print so you see the shape of the work.
- SearchingLinear search, binary search. The first place O(log n) pays rent.
- RecursionBase case, recursive case, the call stack. Fibonacci, factorial, and when recursion hurts.
- Graph AlgorithmsBFS, DFS, Dijkstra. Traverse the graph you built last page.
- Dynamic ProgrammingOverlapping subproblems and memoization. Fibonacci once more, this time in linear time.
- LeetCode Warm-UpsA dozen classic problems, brute force first, then the improvement.
Paradigms
- Object-Oriented ProgrammingClasses, instances, inheritance, dunder methods, @property, @classmethod.
- Functional ProgrammingPure functions, map, filter, reduce, closures, decorators with @ syntax.
- Mixing Paradigms in Real CodeOOP for boundaries, FP for leaves. Dataclasses where they fit.
- Types and Type HintsStatic hints, mypy and pyright strict, Protocols, TypedDict, generics. Why the red squiggles were right.
- Python GotchasMutable defaults, late-binding closures, is vs ==, float precision, integer caching, iter-while-mutating.
Under the Hood
Project
Libraries
- What Is a Library?Build a small one, package it with pyproject.toml, pip install -e it, import your own code.
- NumPyArrays, broadcasting, vectorized math, strides in memory.
- pandasDataFrames, joins, group-bys, missing data, a little time series.
- PlotlyInteractive charts you can actually ship.
- PyTorchTensors, autograd, a tiny two-layer MLP trained end-to-end.
Data & Stats
Services
- What Is an API?Contracts between programs. HTTP verbs, status codes, JSON, the story of REST.
- APIs: Build One, Call OneFastAPI handlers, async, pydantic, calling other people's APIs with requests and httpx.
- What Is a Database?Why a file is not enough. ACID, indexes, the split between SQL and NoSQL.
- SQL DatabasesBuild a tiny ORM yourself, then meet SQLAlchemy. SQLite from the command line.
- NoSQL DatabasesBuild a JSON document store yourself, then meet MongoDB and the document model.
- APIs That Talk to DatabasesFastAPI plus SQLAlchemy. The shape of every backend at every startup.
Cloud
- What Is the Cloud?Bare metal to VMs to containers to serverless. How AWS ended up running the internet.
- AWS S3Object storage. Buckets, keys, presigned URLs, the dollar math.
- AWS DynamoDBManaged NoSQL. Partition keys, single-table design, why Amazon built it.
- AWS SQSQueues. Decoupling producers from consumers. Visibility timeouts and dead-letter queues.
- AWS LambdaServerless functions. Cold starts, billing by the millisecond, the async story.
AI
Deep Learning: Math
- Vectors and MatricesVectors as points, matrices as transformations. The steel every neural network is forged from.
- Derivatives and the Chain RuleRates of change, the chain rule, why every weight update reads off a derivative.
- Probability by SimulationBayes, expected value, the central limit theorem — measured by simulation, then named.
- Computational GraphsDAGs, topological sort, automatic recalculation. The data structure every framework runs on.
Deep Learning: Neurons
- What a Neuron ComputesOne weighted sum plus a bias plus an activation. Wire enough of them together and you get a brain.
- Why Nonlinearity Is RequiredWithout it, 100 layers collapse to one matrix multiply. The prism that splits the light.
- The Sigmoid CurveThe S-curve from population growth, brought into neural nets and then mostly retired.
- Tanh and Zero-CenteringA rescaled sigmoid that centers outputs around zero — and why that matters to gradient descent.
- ReLU and the Dying Neuronmax(0, x). The activation that unlocked deep learning — and the failure mode Leaky ReLU fixes.
Deep Learning: Loss
- The Loss SurfaceLoss as terrain. Different losses make different landscapes — bowls, ravines, plateaus, cliffs.
- MSE vs MAESquared error chases outliers. Absolute error ignores them. Pick which lie you care about.
- Huber LossSmooth near zero, robust at the extremes. The default loss for value-based reinforcement learning.
- Hinge Loss and MarginsRight side of zero isn't enough — be right by at least 1. The loss that defined the SVM era.
- Binary Cross-EntropyThe cost of confident wrong predictions, exponential. The loss every binary classifier uses.
- Categorical Cross-EntropyThe same loss every LLM trains on. Multi-class probabilities scored against a one-hot truth.
- Softmax and TemperatureLogits to probabilities. Temperature controls the dimmer — peaked and confident, or flat and creative.
Deep Learning: Training
- Gradient Descent on TerrainStep downhill by the slope. Quadratic bowls, narrow ravines, saddle points, multimodal traps.
- The Optimizer RaceVanilla SGD, momentum, AdaGrad, RMSProp, Adam. Built from scratch, raced on the same valley.
- Weight InitializationZeros kill the network. Wrong-scale random kills it too. Xavier and Kaiming earn their math.
- The Gradient TelescopeWhy gradients shrink to zero or blow up to infinity in deep networks — and the residual fix.
- Residual Connectionsy = F(x) + x. The skip-connection elevator that made 50-layer networks finally trainable.
- Tape-Based AutogradRecord the forward pass on a tape, walk it backward, apply the chain rule. PyTorch in 300 lines.
- Backprop By HandOnce, no autograd, every gradient written out and verified against finite differences.
Deep Learning: Vision
- Why CNNs Beat MLPs on ImagesWeight sharing across positions: 9 weights for a 3×3 conv vs 50K for the equivalent FC layer.
- Convolutional PaddingValid, same, full. Why every convolution shrinks without padding — and what to do about it.
- Learned ConvolutionA random 3×3 kernel becomes a Sobel edge detector after training. Features discovered, not designed.
- Stride and ResolutionStride 2 halves each dimension. Learned downsampling that picks what to keep.
- Pooling and Shift InvarianceMax pool keeps the strongest activation in each window. The cheap, parameter-free downsampler.
- Dilated ConvolutionsSkip elements between kernel taps. Five layers cover 63 pixels instead of 11.
- Transfer LearningFreeze early layers, retrain the last. Why pretraining is the dominant paradigm in modern ML.
Deep Learning: Sequences
- Why Order MattersMemoryless, windowed, EMA — three pre-neural takes on sequence data. Shuffle them and watch them die.
- The Recurrent NetworkA hidden state that updates each step. The same weights every time. Trained on baby names.
- N-Gram Language ModelsCounting, normalizing, sampling. The simplest possible language model. The mental model GPT scales.
- Embeddings as Lookup TablesA learned vector per discrete ID. Similar things end up near each other after training.
- Embeddings as Matrix FactorizationCo-occurrence factored into low-dimensional vectors. The GloVe idea, hand-rolled.
- The Attention MechanismQuery, key, value, softmax. The brain's spotlight on which input position matters right now.
- The Transformer From ScratchSelf-attention, multi-head, positional encoding, residuals, layer norm. The full architecture, built by hand.
- Byte Pair EncodingIteratively merge the most common adjacent token pair. The tokenizer behind GPT and LLaMA.
- Pretrain, SFT, PreferenceThree stages turn a token predictor into an assistant. Each stage shown end-to-end at toy scale.
Deep Learning: Scale
- Activation and Gradient HealthEKG and bloodwork for the brain you built. Catch dead neurons and vanishing gradients early.
- Batch NormalizationNormalize per-layer activations to mean 0, std 1. The most-cited stabilization trick of the decade.
- Distributed TrainingData parallelism with multiprocessing. Average gradients across workers via simulated all-reduce.
- Numerical Precision and fp16Half the memory, double the throughput, plus loss scaling so small gradients survive.