Coding by Hand
Rust home

x86 and the Weight of Compatibility

Imagine a city with a law that forbids tearing down any old building. The first downtown went up in 1978. The streets were narrow and the apartments only had two rooms each. New neighborhoods came every few years — a 1985 expansion that doubled every room, a 1996 multimedia district, a 1999 graphics quarter, a 2017 supercomputer suburb — and every single one had to be built on top of or around the old downtown. The 1978 buildings still stand. People still live in them. A pizza delivery driver who started in 1985 can walk into any of those apartments today and the door key still works. This city is called x86, and Intel and AMD have been adding floors to it for forty-eight years without ever knocking anything down.

x86 as a city built in layers from 1978 onward, with the original downtown still standing at the center.
x86 as a city built in layers from 1978 onward, with the original downtown still standing at the center.

The original downtown was the Intel 8086, finished in June 1978 by a team that included Stephen Morse and led by Jean Claude Cornet at Intel's Santa Clara headquarters. The 8086 had 81 instructions — small words the chip understood like MOV for "move bytes from here to there" and JMP for "jump to this address." None of this would have mattered if the chip had not won a contract that came along three years later. In 1981 IBM picked an Intel-derived 8088 for its first PC because Intel could ship volume parts and was willing to license the design. Once Lotus 1-2-3 and WordPerfect were compiled for the 8086 instruction set, that set became a contract Intel could never break. Millions of dollars of business software depended on it. Tear down the old downtown and every tenant in the city is homeless.

Intel learned the contract the hard way. In 1981 they tried to replace the 8086 with a totally new chip called the iAPX 432, designed without any 8086 instructions at all. The 432 was supposed to be the future. It flopped — it ran the old software either slowly through emulation or not at all, and the customers refused to move. Intel pulled it. The lesson stuck. Every chip since has been required to run every old 8086 program on bare metal, at full speed, byte for byte the same. The 80286 in 1982 added 16 new instructions for memory protection but kept the original 81. The 80386 in 1985 added 39 more for 32-bit addressing but kept all the rest. Each new release was a new neighborhood bolted onto the old city, never a replacement for it.

Each new x86 extension stacks on top of the previous ones without replacing them.
Each new x86 extension stacks on top of the previous ones without replacing them.

The neighborhoods piled up faster as the workloads got hungrier. MMX in 1996 added 57 instructions for moving short integers in parallel — Intel built it because Pentiums were getting beaten by dedicated audio and video chips, and the marketing department needed an answer for "why is my PC slow at video?" SSE in 1999 added 70 more for floating-point math, mostly because games like Quake and Unreal wanted to do 3D graphics on the CPU before consumer GPUs were a thing. Then SSE2 in 2001 came along when Intel realized SSE could replace the old x87 math unit entirely. The most surprising entry on the timeline came in 2003 — Advanced Micro Devices, the smaller competitor that had spent two decades as Intel's understudy, designed the 64-bit extension to x86 and called it AMD64. Intel had been pushing a totally separate 64-bit chip called Itanium that broke compatibility on purpose, betting the old city would not survive the move to 64-bit. AMD bet the opposite, kept the old downtown, and added new 64-bit registers on the side. Itanium failed within five years. Today every 64-bit Intel chip implements AMD's instructions, which is the only time in the company's history Intel had to copy AMD's homework.

Here is the receipt for forty-eight years of additions. The program below is a Rust table of every major x86 extension, the year it shipped, how many new instructions it carried, and the running total. Run it and watch the line at the bottom grow.

struct Extension {
    year: u16,
    name: &'static str,
    added: u32,
    reason: &'static str,
}

const EXTENSIONS: &[Extension] = &[
    Extension { year: 1978, name: "8086 base",       added: 81,  reason: "the original 16-bit set"        },
    Extension { year: 1982, name: "80286 protected", added: 16,  reason: "memory protection, rings"       },
    Extension { year: 1985, name: "80386 (i386)",    added: 39,  reason: "32-bit registers and addressing"},
    Extension { year: 1989, name: "80486",           added: 6,   reason: "atomic CMPXCHG, BSWAP"          },
    Extension { year: 1993, name: "Pentium",         added: 8,   reason: "CPUID, RDTSC, CMOV"             },
    Extension { year: 1996, name: "MMX",             added: 57,  reason: "first SIMD on integers"         },
    Extension { year: 1999, name: "SSE",             added: 70,  reason: "SIMD on floats for 3D games"    },
    Extension { year: 2001, name: "SSE2",            added: 144, reason: "wider SIMD, replaces x87 math"  },
    Extension { year: 2003, name: "x86-64 (AMD64)",  added: 35,  reason: "64-bit registers, AMD's gift"   },
    Extension { year: 2004, name: "SSE3",            added: 13,  reason: "horizontal adds for audio"      },
    Extension { year: 2006, name: "SSSE3",           added: 16,  reason: "shuffle and pack for codecs"    },
    Extension { year: 2007, name: "SSE4",            added: 54,  reason: "string and dot-product ops"     },
    Extension { year: 2011, name: "AVX",             added: 149, reason: "256-bit vectors"                },
    Extension { year: 2013, name: "AVX2",            added: 135, reason: "integer math on 256-bit lanes"  },
    Extension { year: 2017, name: "AVX-512",         added: 716, reason: "512-bit vectors for HPC and AI" },
    Extension { year: 2021, name: "AMX",             added: 12,  reason: "matrix tiles for AI inference"  },
];
fn main() {
    println!("x86 instruction additions, 1978 to 2026");
    println!(
        "{:<5} {:<20} {:>6} {:>9}  bars (one # per 10 added)",
        "year", "extension", "added", "cum total",
    );
    println!("{}", "-".repeat(78));
    let mut total: u32 = 0;
    for ext in EXTENSIONS {
        total += ext.added;
        let bars: String = "#".repeat((ext.added / 10) as usize);
        println!(
            "{:<5} {:<20} {:>6} {:>9}  {}",
            ext.year, ext.name, ext.added, total, bars,
        );
    }
    println!("{}", "-".repeat(78));
    println!();
    println!("notes:");
    for ext in EXTENSIONS {
        println!("  {} ({}) - {}", ext.year, ext.name, ext.reason);
    }
    println!();
    println!("a 2026 Intel chip still decodes every one of these {} instructions.", total);
}

Build it and read the bars sideways.

x86 instruction additions, 1978 to 2026
year  extension             added cum total  bars (one # per 10 added)
------------------------------------------------------------------------------
1978  8086 base                81        81  ########
1982  80286 protected          16        97  #
1985  80386 (i386)             39       136  ###
1989  80486                     6       142  
1993  Pentium                   8       150  
1996  MMX                      57       207  #####
1999  SSE                      70       277  #######
2001  SSE2                    144       421  ##############
2003  x86-64 (AMD64)           35       456  ###
2004  SSE3                     13       469  #
2006  SSSE3                    16       485  #
2007  SSE4                     54       539  #####
2011  AVX                     149       688  ##############
2013  AVX2                    135       823  #############
2017  AVX-512                 716      1539  #######################################################################
2021  AMX                      12      1551  #
------------------------------------------------------------------------------

notes:
  1978 (8086 base) - the original 16-bit set
  1982 (80286 protected) - memory protection, rings
  1985 (80386 (i386)) - 32-bit registers and addressing
  1989 (80486) - atomic CMPXCHG, BSWAP
  1993 (Pentium) - CPUID, RDTSC, CMOV
  1996 (MMX) - first SIMD on integers
  1999 (SSE) - SIMD on floats for 3D games
  2001 (SSE2) - wider SIMD, replaces x87 math
  2003 (x86-64 (AMD64)) - 64-bit registers, AMD's gift
  2004 (SSE3) - horizontal adds for audio
  2006 (SSSE3) - shuffle and pack for codecs
  2007 (SSE4) - string and dot-product ops
  2011 (AVX) - 256-bit vectors
  2013 (AVX2) - integer math on 256-bit lanes
  2017 (AVX-512) - 512-bit vectors for HPC and AI
  2021 (AMX) - matrix tiles for AI inference

a 2026 Intel chip still decodes every one of these 1551 instructions.

The bar for AVX-512 in 2017 jumps off the right side of the table — 716 new instructions in a single release, more than the entire chip had before MMX existed. That one row is a glimpse of what compatibility costs. A modern Intel chip is not 81 instructions wide anymore. It is over 1,500 instructions wide, and every one of them must be decoded correctly even when the program is from 1985. The silicon that decodes instructions on a 2024 Core processor — the part called the front end — takes up a noticeable slice of the chip's transistors and burns power every cycle, because it has to be ready for any word from any of those forty-eight years of neighborhoods.

The decoder section of a modern x86 chip, the part responsible for understanding every old and new instruction.
The decoder section of a modern x86 chip, the part responsible for understanding every old and new instruction.

There is a question hidden in the table. Look at 1989 — the 80486 added only 6 instructions, then the Pentium in 1993 added 8. The pace was slowing because the basic city was finished. From 1996 onward the additions are almost all SIMD instructions, which are wide parallel math operations for video, audio, and now machine learning. The shape of the additions tells you what the world was asking the chip to do. The 8086 was for spreadsheets. The 386 was for word processors. SSE was for games. AVX-512 is for training neural networks. Same downtown, new suburbs every time the world wanted a new thing fast.

A 2024 Intel chip decodes roughly four times as many instructions as a contemporary ARM core.
A 2024 Intel chip decodes roughly four times as many instructions as a contemporary ARM core.

The price of the no-demolition rule is dead weight. A chip designed clean in 2024 — like Apple's M-series or the ARM cores in your phone — only has to decode about 400 instructions total. A 2024 Intel chip has to decode 1,500-plus, including instructions nobody has written new code for in twenty years. The transistors that handle the rare old instructions sit there using power and warming up the package whether they get used or not. This is the weight that gives the lesson its name. The 1978 contract Intel signed with IBM was a promise to never make customers rewrite their software, and forty-eight years later that promise is still pressing down on every silicon design Intel ships.

The next lesson is about the company that took the opposite bet — that a small, clean instruction set with no historical baggage could beat the city of x86 in the long run.